...
A utility to replace headers in a fasta file with simple, incremental sequence IDs to (hopefully) eliminate issues with headers when the fasta file is used as input with other apps. fastaRename is intended as a step in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow, but can be used to rename fasta sequences for other uses. Sequences are renamed based on a user-defined two-letter genus species abbreviation. 3 files are produced:
1- .fasta - renamed sequences
2- .gg - new sequence names (for downstream OrthoMCL input in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.)
3- .map - maps new sequence names to original fasta headers. This will be useful to associate sequences with original fasta headers if needed.
...
- As part of the workflow referenced above, it is intended that an input fasta file contain sequences from a single species, presumably the entire protein-encoding gene repertoire. This is why a two-letter abbreviation is suggested. If using the app to rename sequences beyond the scope of this workflow, choose an abbreviation that makes sense for your experimental design.
- It is a good idea to keep track of the numbers of sequences and headers in your input files, and compare them to the outputs to ensure that output faithfully represents input.
- Please visit Cluster Orthologs and Paralogs and Assemble Custom Gene Sets to see how fastaRename fits into the larger workflow.
- flattenClusters 1.0 can be used to map renamed sequences back to original FASTA headers
- App adapted from PERL script originally written by Chih-Horng Kuo.
...
2- .gg - new sequence names (for downstream OrthoMCL input in the Cluster Orthologs and Paralogs and Assemble Custom Gene Sets workflow.)
3- .map - maps new sequence names to original fasta headers. This will be useful to associate sequences with original fasta headers if needed.