DESeq
Summary
Determines differential expression analysis on count based expression data sets.
Introduction
DESeq estimates differentially expressed gene lists based on a negative binomial distribution model. Previous methods for identifying differentially expressed gene lists assumed a Poisson distribution, however Poisson does not account for variation (or overdispersion) found in expression data. DESeq uses a negative binomial distribution (similar to edgeR), assuming variance in the case of few replicates.
The input is a tab-delimited file containing genes and their expression values. The results include files detailing the results of differential expression testing (one that includes all of the results, and one that only includes the results that exceed a minimum false-discovery rate). Also included for visualization purposes are plots of the estimated dispersions, the log fold changes against the mean normalized counts and a histogram of p-values. The plots are purely for visualization purposes and may not be necessary for all users.
Reference:
Anders S1, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.
http://genomebiology.com/2010/11/10/R106
Quick Start
- To use DESeq, your input file must be tab-delimited. You must also know the library type (either "single-end" or "paired-end") for each column in your input file.
- Resources: http://bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf
Test Data
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> DESeq.
Input File(s)
Use DESeq_test_data.tsv from the directory above as test input. This example data comes from the RNASeq Drosophilia example used in the DESeq paper. This was unpublished data B. Wilczynski, Y.-H. Liu, N. Delhomme and E. Furlong.
Parameters Used in App
When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.
...
Inputs
...
- Tab-delimited input: DESeq_test_data.tsv from the directory listed above.
- Column where feature names are found: 1
...
Experiment Design
...
- Comma-separated list of factors for the data columns in your file: untreated,untreated,untreated,untreated,treated,treated,treated
- Comma-separated list of library types for each factor listed above: single-end,single-end,paired-end,paired-end,single-end,paired-end,paired-end
- Comma-separated pair of factor for comparison: untreated,treated
...
Statistical Options
...
- Minimum false-discovery rate: 0.1
- Quantile for removing insignificant genes: 0.4
Output Files
For the test case, the output files you will find in the example_data directory are:
DESeq_Dispersion.png - plot of the estimated dispersion
DESeq_MAPlot.png - plot of log fold changes against the mean normalized counts
DESeq_pValues.png - histogram of p-values
DESeq_test_results_significant.txt
DESeq_test_results.txt