Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A utility to replace headers in a fasta file with simple, incremental sequence IDs to (hopefully) eliminate issues with headers when the fasta file is used as input with other apps.  fastaRename is intended as a step in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow, but can be used to rename fasta sequences for other uses. Sequences are renamed based on a user-defined two-letter genus species abbreviation. 3 files are produced:

1- .fasta - renamed sequences
2- .gg - new sequence names (for downstream OrthoMCL input in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.)
3- .map - maps new sequence names to original fasta headers.  This will be useful to associate sequences with original fasta headers if needed.

...

  • As part of the workflow referenced above, it is intended that an input fasta file contain sequences from a single species, presumably the entire protein-encoding gene repertoire.  This is why a two-letter abbreviation is suggested.  If using the app to rename sequences beyond the scope of this workflow,  choose an abbreviation that makes sense for your experimental design.
  • It is a good idea to keep track of the numbers of sequences and headers in your input files, and compare them to the outputs to ensure that output faithfully represents input.
  • Please visit Cluster Orthologs and Paralogs and Assemble Custom Gene Sets to see how fastaRename fits into the larger workflow.
  • flattenClusters 1.0 can be used to map renamed sequences back to original FASTA headers
  • App adapted from PERL script originally written by Chih-Horng Kuo.

...

2- .gg - new sequence names (for downstream OrthoMCL input in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.)

3- .map - maps new sequence names to original fasta headers.  This will be useful to associate sequences with original fasta headers if needed.