Sailfish_align_quant-0.9.2
The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverse.org. Thank you.
Rationale and background:
Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnology (doi:10.1038/nbt.2862)
Rob Patro, Stephen M. Mount, and Carl Kingsford (2014)
Sailfish is a tool for transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All you need to run sailfish is a fasta file containing your reference transcripts and a (set of) fasta/fastq file(s) containing your reads. Sailfish runs in two phases; indexing and quantification. The indexing step is independent of the reads, and only needs to be run once for a particular set of reference transcripts and choice of k (the k-mer size). The quantification step, obviously, is specific to the set of RNA-seq reads and is thus run more frequently.
Pre-Requisites (for both versions 1.1b and 2.0)
A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Mandatory arguments
Transcript file name (in fasta format)
FASTQ files (either SE or PE reads)
Fragment Library Type (specify the format of the library- more details(http://sailfish.readthedocs.io/en/master/library_type.html))
File type (Enter whether the library is paired end or single end )
Optional arguments
Number of bootstraps ( This option takes a positive integer that dictates the number of bootstrap samples to compute. The more samples computed, the better the estimates of varaiance, but the more computation (and time) required)
Number of GibbsSamples (this option produces samples that allow us to estimate the variance in abundance estimates. However, in this case the samples are generated using posterior Gibbs sampling over the fragment equivalence classes rather than bootstrapping)
Test/sample data
The following test data are provided for testing Sailfish_align_quant-0.9.2 in here - /iplant/home/shared/iplantcollaborative/example_data/Salmon:
Transcript file - transcripts.fa
FASTQ files - reads_1.fq and reads_2.fq
Run Sailfish_align_quant-0.9.2 on FASTQ files (reads_1.fq and reads_2.fq) using ‘transcripts.fa'.
Results
Successful execution of the Sailfish_align_quant-0.9.2 will create a directory named reads_1. The directory will contain several files and directories:
logs
Index
reads_1
quant.sf: When the quantification step is finished, the directory
<quant_dir>will contain a file named “quant.sf” (and, if bias correction is enabled, an additional file names “quant_bias_corrected.sf”). This file contains the result of the Sailfish quantification step. This file contains a number of columns (which are listed in the last of the header lines beginning with ‘#’). Specifically, the columns are (1) Transcript ID, (2) Transcript Length, (3) Transcripts per Million (TPM) and (6) Estimated number of reads (an estimate of the number of reads drawn from this transcript given the transcript’s relative abundance and length).
More information on the tool can be found here - http://sailfish.readthedocs.io/en/master/index.html