edgeR with Fisher's Exact Test

Summary

Determines differential expression analysis on count based expression data sets using edgeR. Significance testing is pairwise via Fisher's Exact Test.

Introduction

While there are many tools that determine differential expression for microarray data, these tools assume a continuous expression response (in fluorescence intensity), whereas RNA-Seq, CHIP-Seq, SAGE, DGE and proteomics data generate expression counts. edgeR (empirical analysis of DGE in R) compares expression counts from different experimental data sets and uses Fisher's Exact Test to identify differentially expressed gene products. The edgeR approach uses a negative binomial distribution and simplifies estimation of overdispersion by assuming that mean and variance are related, allowing application to experiments with small numbers of replicates.

At least one of the experimental conditions must have replicates.edgeR assumes a negative binomial distribution (which simplifies to a Poisson distribution when there is no variation) and uses Bayes' inference to correct for variation prior to using Fisher's Exact Test to identify differential expression.

The newer version of edgeR allows comparison of more than two groups.

Reference:

Robinson MD, McCarthy DJ, Smyth GK. Bioinformatics. 2010 Jan 1;26(1):139-40. "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data." http://bioinformatics.oxfordjournals.org/content/26/1/139.long

Community rating: not available.

Quick Start

Inputs

edgeR requires three inputs:

(1) counts matrix

This is a tab-separated file where rows represent gene products (transcripts, genes, exons, proteins) and each column has the expression values for the gene product. Example:

Gene   wt1   wt2   hy5   hy5_2
AT1G01070.1   1   0   0   0
AT1G01070.2   1   0   0   0
AT1G01090.1   46   48   33   32
AT1G01100.1   116   115   59   66
AT1G01100.2   116   115   59   66
AT1G01100.3   82   83   35   42

(2) comma-separated list of factors for the data columns in your counts matrix file (group factor)

This list denotes the experimental group of each sample. Alphanumeric only.
Example: wt,wt,hy5,hy5

(3) comma-separated pair of factors for comparison

This list denotes which experimental groups should be compared. Alphanumeric only.

Example: wt,hy5

Resources: http://bioconductor.org/packages/release/bioc/html/edgeR.html

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> edgeR_exact_test.

Analysis

Inputs

Use ccombine_result.txt from the directory above as test input.

Column where feature names are found: 1

this parameter indicates the column in the input file that contains the gene product names/accessions.

Experiment Design

Comma-separated list of factors for the data columns in your file: wt,wt,hy5,hy5

this input must use alphanumerical characters only and list each expression sample (column) in the matrix file (ccombine_result.txt).

Comma-separated pair of factors for comparison: wt,hy5

this input must use alphanumerical characters only and list the conditions that are to be compared to determine differential expression.

Statistical Options

Minimum sum count across columns: 5

rows with a sum count below this number will not be included in the analysis.

Select method for multiple testing correction: BH

Holm - Holm correction based upon Bonferroni (a simple test uniformly more powerful than the Bonferroni correction)
Hochberg - test is considered less conservative/stringent than Bonferroni's test
Hommel - also derived from the Bonferroni method
Bonferroni - simplest and most conservative method to control type I errors (i.e., rejects a true call)
BH - Benjamini-Hochberg correction?
BY - Benjamini & Yekutieli correction?
FDR - False Discovery Rate

Minimum FDR returned:0.05

usually 0.01 or 0.05 this provides the cutoff of FDR results returned?

Custom-specified dispersion:not used in this test.

Entering a value will over-ride edgeR's dispersion calculation. Value must be between 0.0-1.0. In experiments without replications dispersion --> 0.0.

Output Files

edgeR produces two tabular data text files as output.Both files are tab-delimited and include the names/accessions from the input file with their calculated Pvalues and adjusted Pvalues.

edgeR-all.txt - this file contains the calculated Pvalues for all accessions/names listed in the input file.

edgeR-significant.txt - this file contains only the accessions/names that are considered differentially expressed based upon the statistical analysis performed.

For the test case, the output files you will find in the example_data directory are named edgeR-all.txt and edgeR-significant.txt.

Sample of example test files:

edgeR-all.txt
logConc   logFC   P.Value   adj.P.Val
Locus_7502_Transcript_1/1_Confidence_1.000_Length_280.mrna1.exon1   -7.76628523647046   -1.93212021058096   6.42270046482157e-219   3.14416878554875e-214
Locus_885_Transcript_1/1_Confidence_1.000_Length_365.mrna1.exon1   -10.6497833642559   -3.35713268429383   2.43547058935892e-160   5.96130136157382e-156

edgeR-significant.txt
logConc   logFC   P.Value   adj.P.Val
Locus_7502_Transcript_1/1_Confidence_1.000_Length_280.mrna1.exon1   -7.76628523647046   -1.93212021058096   6.42270046482157e-219   3.14416878554875e-214
Locus_885_Transcript_1/1_Confidence_1.000_Length_365.mrna1.exon1   -10.6497833642559   -3.35713268429383   2.43547058935892e-160   5.96130136157382e-156

Tool Source for App

http://bioconductor.org/packages/release/bioc/html/edgeR.html