edgeR with Fisher's Exact Test
edgeR with Fisher's Exact Test
Summary
Determines differential expression analysis on count based expression data sets using edgeR. Significance testing is pairwise via Fisher's Exact Test.
Introduction
While there are many tools that determine differential expression for microarray data, these tools assume a continuous expression response (in fluorescence intensity), whereas RNA-Seq, CHIP-Seq, SAGE, DGE and proteomics data generate expression counts. edgeR (empirical analysis of DGE in R) compares expression counts from different experimental data sets and uses Fisher's Exact Test to identify differentially expressed gene products. The edgeR approach uses a negative binomial distribution and simplifies estimation of overdispersion by assuming that mean and variance are related, allowing application to experiments with small numbers of replicates.
At least one of the experimental conditions must have replicates.edgeR assumes a negative binomial distribution (which simplifies to a Poisson distribution when there is no variation) and uses Bayes' inference to correct for variation prior to using Fisher's Exact Test to identify differential expression.
The newer version of edgeR allows comparison of more than two groups.
Reference:
Robinson MD, McCarthy DJ, Smyth GK. Bioinformatics. 2010 Jan 1;26(1):139-40. "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data." http://bioinformatics.oxfordjournals.org/content/26/1/139.long
Community rating: not available.
Quick Start
Inputs
edgeR requires three inputs:
(1) counts matrix
This is a tab-separated file where rows represent gene products (transcripts, genes, exons, proteins) and each column has the expression values for the gene product. Example:
Gene wt1 wt2 hy5 hy5_2
AT1G01070.1 1 0 0 0
AT1G01070.2 1 0 0 0
AT1G01090.1 46 48 33 32
AT1G01100.1 116 115 59 66
AT1G01100.2 116 115 59 66
AT1G01100.3 82 83 35 42
(2) comma-separated list of factors for the data columns in your counts matrix file (group factor)
This list denotes the experimental group of each sample. Alphanumeric only.
Example: wt,wt,hy5,hy5
(3) comma-separated pair of factors for comparison
This list denotes which experimental groups should be compared. Alphanumeric only.
Example: wt,hy5
Test Data
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> edgeR_exact_test.
Analysis
Inputs
Use ccombine_result.txt from the directory above as test input.
Column where feature names are found: 1
- this parameter indicates the column in the input file that contains the gene product names/accessions.
Experiment Design
Comma-separated list of factors for the data columns in your file: wt,wt,hy5,hy5
- this input must use alphanumerical characters only and list each expression sample (column) in the matrix file (ccombine_result.txt).
Comma-separated pair of factors for comparison: wt,hy5
- this input must use alphanumerical characters only and list the conditions that are to be compared to determine differential expression.
Statistical Options
Minimum sum count across columns: 5
- rows with a sum count below this number will not be included in the analysis.
Select method for multiple testing correction: BH
- Holm - Holm correction based upon Bonferroni (a simple test uniformly more powerful than the Bonferroni correction)
- Hochberg - test is considered less conservative/stringent than Bonferroni's test
- Hommel - also derived from the Bonferroni method
- Bonferroni - simplest and most conservative method to control type I errors (i.e., rejects a true call)
- BH - Benjamini-Hochberg correction?
- BY - Benjamini & Yekutieli correction?
- FDR - False Discovery Rate
Minimum FDR returned:0.05
usually 0.01 or 0.05 this provides the cutoff of FDR results returned?
Custom-specified dispersion:not used in this test.
Entering a value will over-ride edgeR's dispersion calculation. Value must be between 0.0-1.0. In experiments without replications dispersion --> 0.0.
Output Files
edgeR produces two tabular data text files as output.Both files are tab-delimited and include the names/accessions from the input file with their calculated Pvalues and adjusted Pvalues.
edgeR-all.txt - this file contains the calculated Pvalues for all accessions/names listed in the input file.
edgeR-significant.txt - this file contains only the accessions/names that are considered differentially expressed based upon the statistical analysis performed.
For the test case, the output files you will find in the example_data directory are named edgeR-all.txt and edgeR-significant.txt.
Sample of example test files:
edgeR-all.txt
logConc logFC P.Value adj.P.Val
Locus_7502_Transcript_1/1_Confidence_1.000_Length_280.mrna1.exon1 -7.76628523647046 -1.93212021058096 6.42270046482157e-219 3.14416878554875e-214
Locus_885_Transcript_1/1_Confidence_1.000_Length_365.mrna1.exon1 -10.6497833642559 -3.35713268429383 2.43547058935892e-160 5.96130136157382e-156
edgeR-significant.txt
logConc logFC P.Value adj.P.Val
Locus_7502_Transcript_1/1_Confidence_1.000_Length_280.mrna1.exon1 -7.76628523647046 -1.93212021058096 6.42270046482157e-219 3.14416878554875e-214
Locus_885_Transcript_1/1_Confidence_1.000_Length_365.mrna1.exon1 -10.6497833642559 -3.35713268429383 2.43547058935892e-160 5.96130136157382e-156