Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

BUSCO (Benchmarking UniversalSingle-Copy Orthologs) is a tool that provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDBBUSCO assessments are implemented in open-source software, with comprehensive lineage-specific sets of Benchmarking Universal Single-Copy Orthologs for arthropods, vertebrates, metazoans, fungi, eukaryotes, and bacteria. These conserved orthologs are ideal candidates for large-scale phylogenomics studies, and the annotated BUSCO gene models built during genome assessments provide a comprehensive gene predictor training set for use as part of genome annotation pipelines. BUSCO assessments offer intuitive metrics, based on evolutionarily informed expectations of gene content from hundreds of species, to gauge completeness of rapidly accumulating genomic data and satisfy an Iberian's quest for quality - "Busco calidad/qualidade". The software is freely available to download at (http://busco.ezlab.org/). 


Pre-Requisites (for both versions 1.1b and 2.0)Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Mandatory arguments 
    1. Output folder name
    2. Input file (Genome assembly /gene set/transcriptome) in fasta format
    3. Lineage data (You can select the BUSCO profile files for your species of interest from here : /iplant/home/shared/iplantcollaborative/example_data/BUSCO.sample.data). For version 2.0, there is a new lineage "plantae".
    4. Mode of analysis (genome, ogs, trans Default: genome)
  3. Optional arguments
    1. Species (Select from the pre-computed Augustus metaparameters Selecting a closely-related species usually produces better results Valid options: see Augustus help for list of options - http://augustus.gobics.de/binaries/README.TXT. Default: generic). In the new version 2.0, there are several new species that users can pick from.
    2. E-value (Use a custom blast e-value cutoff. Default: 0.01) 
    3. Custom flanking genomic regions in base pairs (bp) Used when extending selected candidate regions before gene prediction Default: Automatically calculated flank sizes based on genome size. It ranges from 5 to 20bp
    4. Performs full optimization for Augustus gene finding training Default: Off
    5. Force overwriting of results files from a previous run with the same name

Test/sample data (for both versions 1.1b and 2.0)data

The following test data are provided for testing BUSCO in here - /iplant/home/shared/iplantcollaborative/example_data/BUSCO.sample.data:

...

Run BUSCO assessment on sequence file ‘target.fa’ in genome mode using 'eukaryota' lineage

Results 

...

Successful execution of the BUSCO assessment pipeline will create a directory named run_<output folder name>. The directory will contain several files and directories:

...