edgeR with Fisher's Exact Test

edgeR with Fisher's Exact Test

Summary

Determines differential expression analysis on count based expression data sets using edgeR. Significance testing is pairwise via Fisher's Exact Test.

Introduction

While there are many tools that determine differential expression for microarray data, these tools assume a continuous expression response (in fluorescence intensity), whereas RNA-Seq, CHIP-Seq, SAGE, DGE and proteomics data generate expression counts. edgeR (empirical analysis of DGE in R) compares expression counts from different experimental data sets and uses Fisher's Exact Test to identify differentially expressed gene products. The edgeR approach uses a negative binomial distribution and simplifies estimation of overdispersion by assuming that mean and variance are related, allowing application to experiments with small numbers of replicates.

At least one of the experimental conditions must have replicates.edgeR assumes a negative binomial distribution (which simplifies to a Poisson distribution when there is no variation) and uses Bayes' inference to correct for variation prior to using Fisher's Exact Test to identify differential expression.

The newer version of edgeR allows comparison of more than two groups.

Reference:

Robinson MD, McCarthy DJ, Smyth GK. Bioinformatics. 2010 Jan 1;26(1):139-40. "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data." http://bioinformatics.oxfordjournals.org/content/26/1/139.long

Community rating: not available.

Quick Start

Inputs

edgeR requires three inputs:

(1) counts matrix

This is a tab-separated file where rows represent gene products (transcripts, genes, exons, proteins) and each column has the expression values for the gene product. Example:

Gene    wt1    wt2    hy5    hy5_2
AT1G01070.1    1    0    0    0
AT1G01070.2    1    0    0    0
AT1G01090.1    46    48    33    32
AT1G01100.1    116    115    59    66
AT1G01100.2    116    115    59    66
AT1G01100.3    82    83    35    42

(2) comma-separated list of factors for the data columns in your counts matrix file (group factor)

This list denotes the experimental group of each sample. Alphanumeric only.
Example: wt,wt,hy5,hy5

(3) comma-separated pair of factors for comparison

This list denotes which experimental groups should be compared. Alphanumeric only.

Example: wt,hy5

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> edgeR_exact_test.

Analysis

Inputs

Use ccombine_result.txt from the directory above as test input.

Column where feature names are found: 1

  • this parameter indicates the column in the input file that contains the gene product names/accessions.
Experiment Design

Comma-separated list of factors for the data columns in your file: wt,wt,hy5,hy5

  • this input must use alphanumerical characters only and list each expression sample (column) in the matrix file (ccombine_result.txt).

Comma-separated pair of factors for comparison: wt,hy5

  • this input must use alphanumerical characters only and list the conditions that are to be compared to determine differential expression.
Statistical Options

Minimum sum count across columns: 5

  • rows with a sum count below this number will not be included in the analysis.

Select method for multiple testing correction: BH

  • Holm - Holm correction based upon Bonferroni (a simple test uniformly more powerful than the Bonferroni correction)
  • Hochberg - test is considered less conservative/stringent than Bonferroni's test
  • Hommel - also derived from the Bonferroni method
  • Bonferroni - simplest and most conservative method to control type I errors (i.e., rejects a true call)
  • BH - Benjamini-Hochberg correction?
  • BY - Benjamini & Yekutieli correction?
  • FDR - False Discovery Rate

Minimum FDR returned:0.05

usually 0.01 or 0.05 this provides the cutoff of FDR results returned?

Custom-specified dispersion:not used in this test.

Entering a value will over-ride edgeR's dispersion calculation. Value must be between 0.0-1.0. In experiments without replications dispersion --> 0.0.

Output Files

edgeR produces two tabular data text files as output.Both files are tab-delimited and include the names/accessions from the input file with their calculated Pvalues and adjusted Pvalues.

edgeR-all.txt - this file contains the calculated Pvalues for all accessions/names listed in the input file.

edgeR-significant.txt - this file contains only the accessions/names that are considered differentially expressed based upon the statistical analysis performed.

For the test case, the output files you will find in the example_data directory are named edgeR-all.txt and edgeR-significant.txt.

Sample of example test files:

edgeR-all.txt
logConc    logFC    P.Value    adj.P.Val
Locus_7502_Transcript_1/1_Confidence_1.000_Length_280.mrna1.exon1    -7.76628523647046    -1.93212021058096    6.42270046482157e-219    3.14416878554875e-214
Locus_885_Transcript_1/1_Confidence_1.000_Length_365.mrna1.exon1    -10.6497833642559    -3.35713268429383    2.43547058935892e-160    5.96130136157382e-156

edgeR-significant.txt
logConc    logFC    P.Value    adj.P.Val
Locus_7502_Transcript_1/1_Confidence_1.000_Length_280.mrna1.exon1    -7.76628523647046    -1.93212021058096    6.42270046482157e-219    3.14416878554875e-214
Locus_885_Transcript_1/1_Confidence_1.000_Length_365.mrna1.exon1    -10.6497833642559    -3.35713268429383    2.43547058935892e-160    5.96130136157382e-156

Tool Source for App