Using the SPAdes Assembler

SPAdes is an assembler intended largely for microbial data and metagenomics studies. It uses a read correction step and is considered a very accurate assembler by some. It is available in the CyVerse Discovery Environment for small jobs with a memory limit of 64 GB. There is a Hi-Mem version that has access to up to 990 GB for larger jobs. There are inputs for PacBio, Nanopore, Sanger, and Ion Torrent data, but it is configured largely for Illumina-type reads in fastq format.

A test run, with the test data...

The test data is in Community Data > iplantcollaborative > example_data > spades > input

The data data consists of a set of paired-end reads (spacing 180 bp) for Rhodobacter sphaeroides, and a set of jumping reads (3000 bp spacing). The first pair goes in the paired inputs for Library 1, and the jumping reads in Library 2. Under options, set Library 1 to paired reads for content, and paired-end for pairing format. Parameters for Library 2 should be set to paired reads for content, and mate pair for pairing format. The Kmer setting should be set to 35 or 41. The remaining options should be left to default (not metagenomics, not single cell, Illumina data).

The results for the test data are in Community Data > iplantcollaborative > example_data > spades > output

About the SPAdes-high-mem 3.6.0 app...

It is a slightly earlier version of the SPAdes-3.8.0 app, but it runs on a largemem node of the Stampede server, so it has 1TB ram available.

Icon

More info on SPAdes: http://bioinf.spbau.ru/spades