QUAST (3.2 and 4.0) in the Discovery Environment
| The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience. |
The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
QUAST: QUality ASsesment Tool for Genome Assemblies
Gurevich, A., Saveliev, V., Vyahhi, N., and Tesler, G. (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072-1075
QUAST is a tool for evaluating genome assemblies by computing various metrics, including
- N50, length for which the collection of all contigs of that length or longer covers at least 50% of assembly length,
- NG50, where length of the reference genome is being covered,
- NA50 and NGA50, where aligned blocks instead of contigs are taken,
- misassemblies, misassembled and unaligned contigs or contigs bases,
- genes and operons covered
QUAST Builds convenient plots for different metrics
- cumulative contigs length,
- all kinds of N-metrics,
- genes and operons covered,
- GC content.
versions
Both QUAST 3.2 and QUAST 4.0 do accept the same arguments and so the following tutorial should be applicable to both the versions. QUAST-4.0 has Icarus visualizer for denovo assembly evaluation inaddition to other feature in 3.2
Pre-Requisites
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Inputs
- Input file (Genome assemblies generated by using any of the genome assemblers in fasta format)
- Output
- Output directory name
- Options
- FASTA file with the reference genome sequence (Users can select a genome from the list of available genome or they can upload their custom genome in zipped format)
- Genes (File with gene positions in reference)
- Operons (See manual for file formats)
- Skip contigs shorter than (default is 500bp)
- Advanced options
- Caption (This will enter in your report. Number of labels should match the number of files)
- Find genes (Enables gene finding. Affects performance)
- Scaffolds (The assemblies are scaffolds and not contigs)
- eukaryotic (find genes with GeneMark-ES) (Genome is eukaryotic. Affects gene finding and contig alignment. Default is prokaryotic (GeneMarkS is used for gene finding))
The test data for testing QUAST in here : /iplant/home/shared/iplantcollaborative/example_data/QUAST.sample.data
Test run
- Open QUAST-3.2 or QUAST-4.0 app in DE
- Select/drag input files (contigs_1.fasta and contigs_2.fasta) into the Inputs section of the app
- Select the name of the output file (quast_test_out) in the output section of the app
- Select reference.fasta.gz, operons.txt genes.txt respectively for Select your reference genome, Gene file and Operons file in the option section of the app
- Leave the rest of the options as default for the test run and then click launch
Test Results
Successful execution of the QUAST assessment pipeline will create the following ouput in the quast_test_out folder
report.txt | assessment summary in plain text format, |
report.tsv | tab-separated version of the summary, suitable for spreadsheets (Google Docs, Excel, etc), |
report.tex | LaTeX version of the summary, |
alignment.svg | contig alignment plot (file is created if matplotlib python library is installed), |
report.pdf | all other plots combined with all tables (file is created if matplotlib python library is installed), |
report.html | HTML version of the report with interactive plots inside, |
contigs_reports/ | |
misassemblies_report | detailed report on misassemblies |
unaligned_report | detailed report on unaligned and partially unaligned contigs |
Note:
- metrics based on a reference genome are computed only if a reference is provided
- metrics based on genes and operons are computed only if proper annotations are provided
More detailed explanation of the above ouput is provided in QUAST manual