BUSCO-v2.0 in the Discovery Environment
| The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience. |
The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, & Evgeny M. Zdobnov Zdobnov’s Computational Evolutionary Genomics Group
Pre-Requisites
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Mandatory arguments
- Output folder name (name to use for the run and all temporary files (appended))
- Input file (genome assembly/gene set/transcript set file in FASTA format)
- Lineage data (Location of the BUSCO lineage data to use. You can select the BUSCO profile files for your species of interest from the Data window under Community Data -> iplantcollaborative -> example_data -> BUSCO.sample.data )
- Mode of analysis (genome, protein and trans. Default: genome)
- Optional arguments
- species (If your species is not in the list, selecting a closely-related species usually produces better results).
- e-value (Use a custom blast e-value cutoff. Default: 0.03)
- long (Performs full optimization for Augustus gene finding training Default: Off
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> BUSCO.sample.data
Execute BUSCO with the following input data
- Output folder - run_example
- Input file - target.fa
- lineage data - example
- mode - genome (default)
- species - fly
- e-value - 0.03 (default)
Results
Successful execution of the BUSCO assessment pipeline will create a directory named run_example along with logs directory. The directory will contain several files and directories:
1- Files
short_summary_run_sample.txt - Contains a plain text summary of the results in BUSCO notation. Also gives a brief breakdown of the metrics.
- full_table_run_sample.txt - Contains the complete results in a tabular format with scores and lengths of BUSCO matches, and coordinates (for genome mode) or gene/protein IDs (for transcriptome or proteins mode).
- missing_busco_list_run_sample.tsv - Contains a list of missing BUSCOs.
2- Directories
- augustus_output - Augustus-predicted genes, only created during genome assessment. File: augustus.log = full details on Augustus jobs File: training_set_XXXX.txt = genes used for Augustus training Folder: predicted_genes = Augustus raw gene output Folder: extracted_proteins = Augustus protein FASTA output Folder: retraining_parameters = Augustus training results Folder: gb = GenBank format complete BUSCOs Folder: gffs = General Feature Format complete BUSCOs
- blast_output - tBLASTn results, not created for assessment of proteins. File: tblastn_XXXX.txt = tabular tBLASTn results File: coordinates_XXXX.txt = locations of BUSCO matches (genome mode)
- hmmer_output Tabular format HMMER output of searches with BUSCO HMMs
- single_copy_busco_sequences - FASTA format file for each complete single-copy BUSCO identified. .faa files contain protein sequences .fna files contain coding sequences (DNA, genome mode only).
More information on BUSCO-v2 inputs, outputs and parameters can be found in this manual