Split FASTA file
Split a FASTA file into sub-files with equal numbers of sequence records.
- To use Split FASTA file, import your data in fasta format.
- Set the number sequence records to include in each file. This will determine how many files will be produced. If you aren't sure how may records are in your large fasta file, you can use the GNU grep app to count them. Enter ">" as the pattern to match and check the box that says "Only a count of selected lines ..."
Input test data for this app appears directly in the Discovery Environment in the Data window under
Community Data -> iplantcollaborative -> example_data -> splitfasta
Use CFLO_1.fa file as input.
Parameters Used in App
Change the 'number of sequence records per subfile' to 10.
This explanation is based on the Output test data shown above in the 'Test Data' section. The main output directory contains:
- Outputs will consist files with selected output prefix followed by a number (e.g. sequences.0). Each file will contain the number of sequences specified (in this case 10). The last file will be empty and the file before it may contain fewer sequences than specified (this happens when the original number of sequences cannot be equally divided by the user-specified subfile number).
- 'orthomcl.rbh' file: Contains reciprocal best hit data used by OrthoMCL to cluster homologs.
- 'orthomcl.setting' file: Contains a summary of the inputs, outputs, and parameters used for the analysis. There is some overlap between this file and OrthoMCL_homolog_clustering_workflow_example.conf above.