QUAST (3.2 and 4.0) in the Discovery Environment

Alert:

 

The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverse.org. Thank you.

Rationale and background:

QUAST: QUality ASsesment Tool for Genome Assemblies

Gurevich, A., Saveliev, V., Vyahhi, N., and Tesler, G. (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072-1075

QUAST is a tool for evaluating genome assemblies by computing various metrics, including 

  • N50, length for which the collection of all contigs of that length or longer covers at least 50% of assembly length,
  • NG50, where length of the reference genome is being covered,
  • NA50 and NGA50, where aligned blocks instead of contigs are taken,
  • misassemblies, misassembled and unaligned contigs or contigs bases,
  • genes and operons covered

QUAST Builds convenient plots for different metrics

  • cumulative contigs length,
  • all kinds of N-metrics,
  • genes and operons covered,
  • GC content.

versions

Both QUAST 3.2 and QUAST 4.0 do accept the same arguments and so the following tutorial should be applicable to both the versions. QUAST-4.0 has Icarus visualizer for denovo assembly evaluation inaddition to other feature in 3.2

Report Example

Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Inputs 
    1. Input file (Genome assemblies generated by using any of the genome assemblers in fasta format)
  3. Output
    1. Output directory name
  4. Options
    1. FASTA file with the reference genome sequence (Users can select a genome from the list of available genome or they can upload their custom genome in zipped format)
    2. Genes (File with gene positions in reference)
    3. Operons (See manual for file formats)
    4. Skip contigs shorter than (default is 500bp)
  5. Advanced options
    1. Caption (This will enter in your report. Number of labels should match the number of files)
    2. Find genes (Enables gene finding. Affects performance)
    3. Scaffolds (The assemblies are scaffolds and not contigs)
    4. eukaryotic (find genes with GeneMark-ES) (Genome is eukaryotic. Affects gene finding and contig alignment. Default is prokaryotic (GeneMarkS is used for gene finding))

Test/sample data

The test data for testing QUAST in here : /iplant/home/shared/iplantcollaborative/example_data/QUAST.sample.data

Test run

  1. Open QUAST-3.2 or QUAST-4.0 app in DE
  2. Select/drag input files (contigs_1.fasta and contigs_2.fasta) into the Inputs section of the app
  3. Select the name of the output file (quast_test_out) in the output section of the app
  4. Select reference.fasta.gz, operons.txt genes.txt  respectively for Select your reference genome, Gene file and Operons file in the option section of the app
  5. Leave the rest of the options as default for the test run and then click launch 

Test Results

Successful execution of the QUAST assessment pipeline will create the following ouput in the quast_test_out folder

report.txtassessment summary in plain text format,
report.tsvtab-separated version of the summary, suitable for spreadsheets (Google Docs, Excel, etc),
report.texLaTeX version of the summary,
alignment.svgcontig alignment plot (file is created if matplotlib python library is installed),
report.pdfall other plots combined with all tables (file is created if matplotlib python library is installed),
report.htmlHTML version of the report with interactive plots inside,
contigs_reports/ 
misassemblies_reportdetailed report on misassemblies
unaligned_reportdetailed report on unaligned and partially unaligned contigs


Note: 

  • metrics based on a reference genome are computed only if a reference is provided
  • metrics based on genes and operons are computed only if proper annotations are provided

 More detailed explanation of the above ouput is provided in QUAST manual