BUSCO in the Discovery Environment
 | The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience. |
The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
BUSCO:Â assessing genome assembly and annotation completeness with single-copy orthologsÂ
Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, & Evgeny M. Zdobnov Zdobnov’s Computational Evolutionary Genomics Group
Pre-Requisites
- A CyVerse account. (Register for an CyVerse account here -Â user.cyverse.org)
- Mandatory argumentsÂ
- Output folder name
- Input file (Genome assembly /gene set/transcriptome) in fasta format
- Lineage data (You can select the BUSCO profile files for your species of interest from here : /iplant/home/shared/iplantcollaborative/example_data/BUSCO.sample.data). For version 2.0, there is a new lineage "plantae".
- Mode of analysis (genome, ogs, trans Default: genome)
- Optional arguments
- Species (Select from the pre-computed Augustus metaparameters Selecting a closely-related species usually produces better results Valid options: see Augustus help for list of options - http://augustus.gobics.de/binaries/README.TXT. Default: generic). In the new version 2.0, there are several new species that users can pick from.
- E-value (Use a custom blast e-value cutoff. Default: 0.01)Â
- Custom flanking genomic regions in base pairs (bp) Used when extending selected candidate regions before gene prediction Default: Automatically calculated flank sizes based on genome size. It ranges from 5 to 20bp
- Performs full optimization for Augustus gene finding training Default: Off
- Force overwriting of results files from a previous run with the same name
The following test data are provided for testing BUSCO in here - /iplant/home/shared/iplantcollaborative/example_data/BUSCO.sample.data:
- Input file - target.fa (genome sequences in fasta format)
- lineage data Â
Run BUSCO assessment on sequence file ‘target.fa’ in genome mode using 'eukaryota' lineage
ResultsÂ
Successful execution of the BUSCO assessment pipeline will create a directory named run_<output folder name>. The directory will contain several files and directories:
1- Files
- short_summary_ Contains summary results in BUSCO notation and a brief breakdown of the metrics
- full_table_ Complete results in tabular format with coordinates, scores and lengths of BUSCO matches
- training_set_ Set of complete BUSCO matches used for training Augustus. Only created during genome assessment
- _tblastn Results in tabular format of tBLASTn searches with BUSCO consensus sequences
2- Directories
- augustus_ Augustus-predicted genes. Only created during genome assessment
- augutus_proteins Corresponding Augustus-predicted proteins. Only created during genome assessment
- Selected Complete BUSCO matches, used for training Augustus
- gb Complete BUSCO matches, GenBank formatÂ
- gffs Complete BUSCO matches, GFF format
- hmmer_output Tabular format HMMER output of searches with BUSCO HMMs