Clean_fasta_header

Alert:

 

The CyVerse App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01').

Also, as part of the 2.8 app categorization, a number of apps were deprecated and are no longer available, and there is no longer an Archive category. You can search for a suitable replacement in the List of Applications in this window, or search on an app name or tool used for an app in the Apps window search field. If you need an app reinstated, please contact support@cyverse.org.

Tutorial under review

For an introduction to using the DE, see Using the Discovery Environment.

Please work through the tutorial and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.

Rationale and background

Clean_fasta_header app removes everything after "|" in the fasta header of the fasta file. The special character "|" is not ideal with many of the bioinformatics tools and it is important to remove them in the fasta header. This app will help you remove one of the special character 

Prerequisites

  1. A CyVerse account (Register for a CyVerse account at https://user.cyverse.org/).

  2. An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

  3. Input: 
    1. Reference genome from DE
  4. Output Folder name: Name of the output folder (default "output")

Test/sample data

This tutorial uses the test data that is stored in the Data Store at Community Data > iplantcollaborative > example_data > clean_fasta_header          

  1. Input:
    1. Reference genomes: Acromyrmex_echinatior
  2. Output Folder name: Use default folder name - "output"

Output

  1. logs
  2. Output
    1. genome.cleaned.fas