edgeR (multifactorial pairwise comparisons) in DE

edgeR (multifactorial pairwise comparisons) in DE

Rationale and background

Currently, the edgeR app in the DE does not allow multifactorial pairwise comparison of RNA-Seq data for differential gene expression analysis. To provide this functionality, the edgeR (multifactorial pairwise comparisons) app has been added. Based on SARTools (R package dedicated to the differential analysis of RNA-Seq data), edgeR (multifactorial pairwise comparisons) allows multifactorial pairwise comparison of RNA-Seq data for differential gene expression analysis. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with the edgeR package, and export the results into easily readable tab-delimited files. It also facilitates the generation of an HTML report that displays all the figures produced, explains the statistical methods, and gives the results of the differential analyses. 

The SARTools R package has been developed at PF2 - Institut Pasteur by M.-A. Dillies and H. Varet (hugo.varet@pasteur.fr). Please cite H. Varet, L. Brillet-Guéguen, J.-Y. Coppee and M.-A. Dillies, SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data, PLoS One, 2016, doi: http://dx.doi.org/10.1371/journal.pone.0157022 when using this tool for any analysis published."

Introduction and overview

While there are many tools that determine the differential expression for microarray data (such as limma), these tools assume a continuous expression response (in fluorescence intensity), whereas RNA-Seq, ChIP-Seq, SAGE, DGE, and proteomics data generate expression counts. The edgeR (empirical analysis of DGE in R) app compares expression counts from different experimental data sets and uses Fisher's Exact Test to identify differentially expressed gene products. The edgeR approach uses a negative binomial distribution and simplifies the estimation of over-dispersion by assuming that mean and variance are related, allowing applications to experiment with small numbers of replicates. At least one of the experimental conditions must have replicated. edgeR assumes a negative binomial distribution (which simplifies to a Poisson distribution when there is no variation) and uses Bayes' inference to correct for variation prior to using Fisher's Exact Test to identify differential expression.

Robinson, Mark D., Davis J. McCarthy, Gordon K. Smyth. Bioinformatics. 2010 Jan 1;26(1):139-40. "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data." http://bioinformatics.oxfordjournals.org/content/26/1/139.long

Prerequisites

  1. A CyVerse account. (Register for a CyVerse account at https://user.cyverse.org/.)

  2. An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

  3. Input files:

4. Parameters

This tutorial uses the test data that is stored in the Data Store at Community Data > iplantcollaborative > example_data > edgeR_multi.          

Starting an edgeR (multifactorial pairwise comparisons) job in the DE

  1. In the DE Apps window, search for and open edgeR (multifactorial pairwise comparisons).

  2. In the Analysis Name field:

    1. Change the name for your analysis (optional).

    2. Enter any comments (optional).

    3. In the Select output folder field, click Browse and navigate to the folder of your choice, or leave the default name of iplant/home/username/analyses.

    4. To retain copies of the input files in your analysis results output folder, click the Retain Inputs checkbox.

  3. Click to open the Input files panel:

    1. For the Target file, click Browse and navigate to test either the file_type test data or the folder_type test data:

    • To test the file_type test data:

      1. For the Target file, browse to select target3.txt inside file_type.  

      2. For the Raw counts file, browse to select counts3.txt.

    • To test the folder_type test data:

      1. For the Target file, browse to select target3.txt inside file_type.

      2. For the Raw counts folder, browse to select raw1 inside folder_type.

  4. Click on the Parameters panel and enter the following:

     5.  Click Launch Analysis. 

     6. After successful completion of running of the app, the following files and figures are generated from the test run.

       Some tab-delimited files are exported in the tables directory. They store information on the features as $\log_2\text{(FC)}$ or p-values and can be read easily in a spreadsheet:

All these parameters will be saved and written at the end of the HTML report in order to keep track of what has been done.

For more information of how to interpret these figures, files, troubleshooting, and FAQs,  please refer SARTools vignette for the differential analysis of 2 or more conditions with DESeq2 or edgeR.