Sailfish_align_quant-0.9.2

Alert:

The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverse.org. Thank you.

Rationale and background:

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnology (doi:10.1038/nbt.2862)

Rob Patro, Stephen M. Mount, and Carl Kingsford (2014)

Sailfish is a tool for transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All you need to run sailfish is a fasta file containing your reference transcripts and a (set of) fasta/fastq file(s) containing your reads. Sailfish runs in two phases; indexing and quantification. The indexing step is independent of the reads, and only needs to be run once for a particular set of reference transcripts and choice of k (the k-mer size). The quantification step, obviously, is specific to the set of RNA-seq reads and is thus run more frequently.

Pre-Requisites (for both versions 1.1b and 2.0)

A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Mandatory arguments
1. Transcript file name (in fasta format)
2. FASTQ files (either SE or PE reads)
3. Fragment Library Type (specify the format of the library- more details(http://sailfish.readthedocs.io/en/master/library_type.html))
4. File type (Enter whether the library is paired end or single end )
Optional arguments
1. Number of bootstraps ( This option takes a positive integer that dictates the number of bootstrap samples to compute. The more samples computed, the better the estimates of varaiance, but the more computation (and time) required)
2. Number of GibbsSamples (this option produces samples that allow us to estimate the variance in abundance estimates. However, in this case the samples are generated using posterior Gibbs sampling over the fragment equivalence classes rather than bootstrapping)

Test/sample data

The following test data are provided for testing Sailfish_align_quant-0.9.2 in here - /iplant/home/shared/iplantcollaborative/example_data/Salmon:

Transcript file - transcripts.fa
FASTQ files - reads_1.fq and reads_2.fq

Run Sailfish_align_quant-0.9.2 on FASTQ files (reads_1.fq and reads_2.fq) using ‘transcripts.fa'.

Results

Successful execution of the Sailfish_align_quant-0.9.2 will create a directory named reads_1. The directory will contain several files and directories:

logs
Index
reads_1
1. quant.sf: When the quantification step is finished, the directory <quant_dir> will contain a file named “quant.sf” (and, if bias correction is enabled, an additional file names “quant_bias_corrected.sf”). This file contains the result of the Sailfish quantification step. This file contains a number of columns (which are listed in the last of the header lines beginning with ‘#’). Specifically, the columns are (1) Transcript ID, (2) Transcript Length, (3) Transcripts per Million (TPM) and (6) Estimated number of reads (an estimate of the number of reads drawn from this transcript given the transcript’s relative abundance and length).

More information on the tool can be found here - http://sailfish.readthedocs.io/en/master/index.html