DESeq2

 


Summary

Determines differential expression analysis on count based expression data sets.

Introduction

DESeq2 estimates differentially expressed gene lists based on a negative binomial distribution model. Previous methods for identifying differentially expressed gene lists assumed a Poisson distribution; however, Poisson does not account for variation (or overdispersion) found in expression data. DESeq2 uses a negative binomial distribution (similar to edgeR), assuming variance in the case of few replicates.

The input is a tab-delimited file containing genes and their expression values. The results include files detailing the results of differential expression testing (one that includes all of the results, and one that only includes the results that exceed a minimum false-discovery rate). Also included for visualization purposes are plots of the estimated dispersions, the log fold changes against the mean normalized counts and a histogram of p-values. The plots are purely for visualization purposes and may not be necessary for all users.

Reference:
Anders S1, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27. http://genomebiology.com/2010/11/10/R106

Quick Start

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> DESeq.

Input File(s)

Use DESeq_test_data.tsv from the directory above as test input. This example data comes from the RNASeq Drosophilia example used in the DESeq paper.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

Inputs
  • Tab-delimited input: DESeq_test_data.tsv from the directory listed above.
  • Column where feature names are found: 1
Experiment Design
  • Comma-separated list of factors for the data columns in your file: untreated,untreated,untreated,untreated,treated,treated,treated
  • Comma-separated list of library types for each factor listed above: single-end,single-end,paired-end,paired-end,single-end,paired-end,paired-end
  • Comma-separated pair of factors for comparison: untreated,treated
Statistical Options
  • Minimum false-discovery rate: 0.1
  • Quantile for removing insignificant genes: 0.4

Output Files

For the test case, the output files you will find in the example_data directory are:

DESeq_Dispersion.png - plot of the estimated dispersion
DESeq_MAPlot.png - plot of log fold changes against the mean normalized counts
DESeq_pValues.png - histogram of p-values
DESeq_test_results_significant.txt
DESeq_test_results.txt

RPlots- heatmaps show 30 most highly expressed genes (not necessarily the biggest fold change), might make sense to plot for most significant ones only, instead of all the gene set. The data is of raw counts, regularized log transformation and from variance stabilizing transformation, respectively.