Assembling a Genome
Genome Assembly Applications
Ray: An easy-to-use yet powerful assembler. that allows you to do longer reads, such as 454 reads. Make a folder in the DE, put your reads in it, drag it into the input, and set the kmer value and run. That's it. Just make sure your reads are really clean, and also make sure your paired read files are named in a way that makes it easy to identify that they are paired.
AllpathsLG: One of the most demanding genome assemblers to use, yet very powerful. You need at least 2 paired read libraries: one with fragment reads that overlap in the middle (e.g. 100 bp reads with 180 bp spacing; and one with mate pair reads that are over 1000 bp. If you don't have these libraries, forget it. If you do, it can produce some of the best assemblies that you can hope for. It uses a built-in error-correction routine, so you may not want to do too much quality trimming first. There are essentially 2 versions of apps for AllpathsLG. One is labelled small because it runs on 1 normal node of (currently) the Stampede server at TACC, so it has just 32 GB ram available. The regular AllPathsLG app (like most of the assemblers) runs on a largemem node, so for the current app on Stampede, it has 1 TB of ram to work with. To judge the amount of memory required by your assembly, assume that all your reads need to go into memory, and then add the approximate of memory occupied by your assembled genome. This is a rough guess, but may help guide you. Err on the small side if you are thinking of using the AllpathsLG-small app. It will start its job much sooner on average because it uses a normal node that is not nearly as in short supply. It works very well with bacterial genomes, but other small genomes can be assembled with it, also.
Soapdenovo2: An excellent genome assembler for Illumina reads. The paired-end read files must be entered in the proper order as any of the 5 possible library inputs. Any single reads go in the 5th library input1 only. Enter a maximum read length for all of the libraries, set the kmer value, and make sure all of the parameters are set for the libraries that have data in them. Once it has run, you can drag the output directory into Gapcloser to help finish the assembly.
Velvet: An old standard at this point, but still an effective assembler. Velvet can be used with short or long reads, and even with SAM or BAM files (sorted by name) when the assembly is reference-guided. First enter your data into VelvetH with the appropriate settings, including a kmer value, and run. Then drag the Velvet output directory into the input of VelvetG, and set the parameters, e.g. paired end insert lengths, and run.
Newbler: Roche's supported assembler for 454 data. It is a very effective assembler and uses the SFF file format natively.
SPAdes: A small genome assembler that has been popular for bacterial genomes. Works with Illumina, Ion Torrent, Oxford Nanopore and even PacBio data. There are essentially 2 versions of apps for SPAdes. One is labelled high-mem because it runs on the largemem node of (currently) the Stampede server at TACC, so it has 1 TB ram available. The regular SPAdes-3.8.0 app runs on a single, normal node on the Lonestar 5 server, so it has 64 GB of ram to work with, which is plenty for most bacterial genome assemblies. To judge the amount of memory required by your assembly, assume that all your reads need to go into memory, and then add the approximate of memory occupied by your assembled genome. This is a rough guess, but may help guide you. Err on the small side if you are thinking of using the SPAdes-3.8.0 app. It will start its job much sooner on average because it uses a normal node that is not nearly as in short supply as the largemem node used by the high-mem app.
HTProcess-jellyfish: A relatively fast kmer-counting program. It provides information about the abundance vs. copying number of different kmers.
HTProcess-kmergenie: Though technically not an assembler, this kmer-counting application is used for guiding assembly, and is an alternative to HTProcess-jellyfish. Both help give you an idea of what your kmer coverage is like.