Include Page | ||||
---|---|---|---|---|
|
AbySS Short Read Assembler
The AbySS Short Read Assembler application can perform de novo assembly on up to three paired-end sequence collections of different sizes plus a single-end collection. It is based on the Abyss 1.27 application from the Canada's Michael Smith Genome Sciences Center supplemented with driver scripts to allow it to function in the iPlant Discovery Environment.
Quick Start
- This deployment of Abyss accepts FASTA and FASTQ sequence files. Upload them and perform the pre-processing steps detailed below, then launch the assembly. The output will be a directory containing contigs and associated support files.
A more detailed overview
You must follow a simple workflow of steps in order to initiate a successful assembly.
- Interlace paired-end files
- Bundle related sequence files
- Launch AbySS
Interlacing
Each lane of paired-end sequence in your data set must be interlaced. To accomplish this, launch the "Interlace Paired FASTQ files" app, select the left and the right (also known as run 1 and run 2) files for a given lane of sequence. Launch the application and note that the resulting single file contains the sequence reads arranged in alternating order.
Code Block | ||
---|---|---|
| ||
@SRR040820.1/1 HWI-EAS248_5_Run18a:1:1:1555:378 length=50
GAAGGAGTCGACCCTTCACCTCGTGCTCCGTCTTCGTGGTGGATTCTAAG
+SRR040820.1/1 HWI-EAS248_5_Run18a:1:1:1555:378 length=50
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIG:II%II(I
@SRR040820.2/1 HWI-EAS248_5_Run18a:1:1:1527:371 length=50
GTCTTCTGATCCAGATGCGTAGGATTCCCCACCGGGTGAAAATCGCACGT
+SRR040820.2/1 HWI-EAS248_5_Run18a:1:1:1527:371 length=50
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHI%
|
Code Block | ||
---|---|---|
| ||
@SRR040820.1/2 HWI-EAS248_5_Run18a:1:1:1555:378 length=55
TTAGGGTTAAAAAACCATTTTATTCAATTGAAACTAAGGATTGATTCACAAGCCC
+SRR040820.1/2 HWI-EAS248_5_Run18a:1:1:1555:378 length=55
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIF
@SRR040820.2/2 HWI-EAS248_5_Run18a:1:1:1527:371 length=55
ATCGAGTCAGCCTCATTGGAACCAAAGCTTGGGAATAAATTTATTGCTGGTGGAG
+SRR040820.2/2 HWI-EAS248_5_Run18a:1:1:1527:371 length=55
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII%II6/3
|
Code Block | ||
---|---|---|
| ||
@SRR040820.1/1 HWI-EAS248_5_Run18a:1:1:1555:378 length=50
GAAGGAGTCGACCCTTCACCTCGTGCTCCGTCTTCGTGGTGGATTCTAAG
+SRR040820.1/1 HWI-EAS248_5_Run18a:1:1:1555:378 length=50
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIG:II%II(I
@SRR040820.1/2 HWI-EAS248_5_Run18a:1:1:1555:378 length=55
TTAGGGTTAAAAAACCATTTTATTCAATTGAAACTAAGGATTGATTCACAAGCCC
+SRR040820.1/2 HWI-EAS248_5_Run18a:1:1:1555:378 length=55
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIF
@SRR040820.2/1 HWI-EAS248_5_Run18a:1:1:1527:371 length=50
GTCTTCTGATCCAGATGCGTAGGATTCCCCACCGGGTGAAAATCGCACGT
+SRR040820.2/1 HWI-EAS248_5_Run18a:1:1:1527:371 length=50
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHI%
@SRR040820.2/2 HWI-EAS248_5_Run18a:1:1:1527:371 length=55
ATCGAGTCAGCCTCATTGGAACCAAAGCTTGGGAATAAATTTATTGCTGGTGGAG
+SRR040820.2/2 HWI-EAS248_5_Run18a:1:1:1527:371 length=55
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII%II6/3
|
Bundling
All interlaced files from a paired-end sequencing library having a given insert size must be bundled together using the "Concatenate Multiple Files" app. For example, three lanes from the same flowcell of 500-bp insert size sequence must be bundled together, as well as lanes from different flowcells if they have the same insert size. For each separate insert size, launch the "Concatenate Multiple Files" app, select the interlaces files from your data browser in the order you wish to add them to one unified filed, then launch the app. The result of this operation will be one large file for each insert size. You must also concatenate all the single end sequences into a single file as the Abyss app only allows a one 'Single-end' library. Sequences bundled into libraries may be of hetergeneous lengths (for example, you may bundle 36bp and 50bp single-end sequences together).
Assembly
Select up to three interlaced and bundled paired-end sequence collections and up to one bundled single-end sequence collection. Advance to the Assembly Options section of the application and select kmer size et. cetera. Then, select an appropriate set of run-time parameters based on the estimated assembly size and volume of source data.
Example Workflow
Mark has two paired end sequence libraries: One has an insert size of 500 and the other has an insert size of 2500. He also has 10 lanes of single-end sequence. The 500-base insert library is comprised of two lanes (lanes 1 and 3 in this example) of sequence and the 2500 base library has one lane (lane 4). Here is a workflow that would allow him to assemble these sequences:
- 500-bp library
- Interlace the Run1 and Run2 reads from lane 1 into a file 500bp_s_1i.fq
- Interlace the Run1 and Run2 reads from lane 3 into a file 500bp_s_3i.fq
- Concatenate the two interlaced lane files 500bp_s_1i.fq and 500bp_s_3i.fq into a single file. Rename it 500bp_s_13i.fq
- 2500-bp library
- Interlace the Run1 and Run2 reads from lane 1 into a file 2500bp_s_4i.fq
- Concatenate the single-end sequences into a file Single.fq
- Launch AbYSS Short Read Assembler.
- Select 500bp_s_13i.fq for Pair-end library 1
- Select 2500bp_s_4i.fq for Pair-end library 2
- Select Single.fq for the Single-end library
- Choose parameters and launch the assembly
Test Case
All files are located in the Community Data directory of the iPlant Discovery Environment at the following path:
Community Data > iplantcollaborative > example_data > abyss
Input file(s)
- Use the file SRR040820_Euc_RNA_pe.fq (6.1 GB) as an test paired-end input file for "Paired-end Sequence Library #1". Leave other libraries empty.
Parameters used in app
When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.
- K-mer size – 27
- Name of this assembly – euc
- Job Size – Transcriptome, Multiple Flowcell
- Leave other parameters their default value
Output file(s)
Please compare the files under http://mirrors.iplantcollaborative.org/example_data/agave_api/abyss-param-complex-ranger-1.2.7/k27Community Data > iplantcollaborative > example_data > abyss > outputs with your results. They should be identical.