Copy with Scaffolding XML of DESeq2 (multifactorial pairwise comparisons)

Copy with Scaffolding XML of DESeq2 (multifactorial pairwise comparisons)

Rationale and background

Currently DeSeq apps (both DeSeq and DeSeq2) in DE, do not allow multifactorial pairwise comparison of RNA-Seq data for differential gene expression analysis. The app - "DEseq2 (multifactorial pairwise comparisons)" is based on SARTools (R package dedicated to the differential analysis of RNA-seq data) which allows multifactorial pairwise comparison of RNA-Seq data for differential gene expression analysis. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with the DEseq2 package, and to export the results into easily readable tab-delimited files. It also facilitates the generation of an HTML report which displays all the figures produced, explains the statistical methods, and gives the results of the differential analysis.

Note

SARTools does not intend to replace edgeR: it simply provides an environment to go with them. For more details about the methodology behind edgeR, the user should read their documentations and papers. In addition, the current app is not intended to perform edgeR's GLM. That version is currently under progress.

Introduction and Overview

DESeq2 estimates differentially expressed gene lists based on a negative binomial distribution model. Previous methods for identifying differentially expressed gene lists assumed a Poisson distribution; however, Poisson does not account for variation (or overdispersion) found in expression data. DESeq2 uses a negative binomial distribution (similar to edgeR), assuming variance in the case of few replicates.
The input is a tab-delimited file containing genes and their expression values. The results include files detailing the results of differential expression testing (one that includes all of the results, and one that only includes the results that exceed a minimum false-discovery rate). Also included for visualization purposes are plots of the estimated dispersions, the log fold changes against the mean normalized counts and a histogram of p-values. The plots are purely for visualization purposes and may not be necessary for all users.

Prerequisites

  1. A CyVerse account (Register for a CyVerse account at https://user.cyverse.org/).

  2. An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

  3. Input files:

Example of a Raw counts file:

Contig

5_OP_1

5_OP_2

5_OP_3

33_OP_1

33_OP_2

33_OP_3

5_M_1

5_M_2

5_M_3

33_M_1

33_M_2

33_M_3

5_LL_1

5_LL_2

5_LL_3

33_LL_1

33_LL_2

33_LL_3

oystercontig_1

8

54

10

17

3

1

19

47

42

44

6

2

229

47

4

301

231

11

oystercontig_2

16

4

16

56

2

1

2

3

0

28

0

0

2

19

1

8

0

5

oystercontig_3

2

8

3

13

2

2

1

24

20

41

4

8

23

12

1

70

4

13

oystercontig_4

7

2

24

139

2

2

3

1

2

10

0

0

1

1

0

0

0

3

oystercontig_5

0

2

1

1

0

0

0

0

1

0

0

0

1

0

0

0

3

0

oystercontig_6

0

0

0

3

0

0

7

0

0

2

0

0

1

0

0

2

0

0

oystercontig_7

127

30

9

46

13

7

153

111

60

60

2

13

245

205

0

123

74

6

oystercontig_8

154

386

57

561

91

123

566

693

503

851

47

129

634

928

17

375

788

126

oystercontig_9

1

1

0

0

0

0

20

3

4

1

0

0

33

11

0

0

12

1

 

Example of a count file per sample with two tab delimited columns without header. 

oystercontig_1

301

oystercontig_2

8

oystercontig_3

70

oystercontig_4

0

oystercontig_5

0

oystercontig_6

2

oystercontig_7

123

oystercontig_8

375

oystercontig_9

0

Note

The user should provide the same number of read files inside a directory corresponding to number of rows in the target file.If the counts and the target files are not supplied in the required formats, the app will not work and you will not be able to run the analysis.

 

4. Parameters

Test/sample data

This tutorial uses the test data that is stored in the Data Store at Community Data > iplantcollaborative > example_data > DESeq2_multi.          

Starting an DESeq2 (multifactorial pairwise comparisons) job in the DE

Open the DE Apps window and search for edgeR (multficatorial pairwise comparisons).

In the Analysis Name:

  1. Change the name for your analysis (optional).

  2. Enter any comments (optional).

  3. In the Select output folder field, click Browse and navigate to the folder of your choice. You can leave the default name iplant/home/username/analyses.

  4. To retain copies of the input files in your analysis results output folder, click the Retain Inputs checkbox.

Click the Input files panel:

  1. If you want to test the file_type test data:

    1. For the Target file , browse to select target3.txt inside file_type.

    2. For the Row counts file, browse to select counts3.txt.

  2. If you want to test the folder_type test data:

    1. For the Target file , browse to select target3.txt inside file_type

    2. For the Raw counts folder, browse to select raw1 inside folder_type

  3. Please note: Only one of the above two options need to be selected

Click on the Parameters panel:

  1. Project name: test_deseq2_file_type

  2. Author Name: Upendra

  3. Reference biological condition: OP

  4. batch:

  5. Variable of Interest: group

  6. FeaturesToRemove: alignment_not_unique,ambiguous,no_feature,not_aligned,too_low_aQual

  7. locfunc: median

  8. Transformation method for PCA/clustering: VST

  9. Mean-variance relationship: parametric

  10. Independent Filtering: TRUE

  11. Cooks Cutoff: TRUE

  12. Significance threshold0.05

  13. p-value adjustment method: BH

  14. colors: dodgerblue,orange,green

Click Launch Analysis.

Output from DEseq2 (multifactorial pairwise comparisons) app:

The following files and figures are generated

  • barplotTC.png: total number of reads per sample;

  • barplotNull.png: percentage of null counts per sample;

  • densplot.png: estimation of the density of the counts for each sample;

  • majSeq.png: percentage of reads caught by the feature having the highest count in each sample;

  • pairwiseScatter.png: pairwise scatter plot between each pair of samples and SERE values (not produced if more than 30 samples);

  • diagSizeFactorsHist.png: diagnostic of the estimation of the size factors;

  • diagSizeFactorsTC.png: plot of the size factors vs the total number of reads;

  • countsBoxplot.png: boxplots on raw and normalized counts;

  • cluster.png: hierachical clustering of the samples (based on VST or rlog data for DESeq2);

  • PCA.png: first and second factorial planes of the PCA on the samples based on VST or rlog data;

  • dispersionsPlot.png: graph of the estimations of the dispersions and diagnostic of log-linearity of the dispersions;

  • rawpHist.png: histogram of the raw p-values for each comparison;

  • MAplot.png: MA-plot for each comparison (log ratio of the means vs intensity);

  • volcanoPlot.png: vulcano plot for each comparison ($-\log_{10}\text{(adjusted P value)}$ vs log ratio of the means).


Some tab-delimited files are exported in the tables directory. They store information on the features as $\log_2\text{(FC)}$ or p-values and can be read easily in a spreadsheet:

  • TestVsRef.complete.txt: contains all the features studied;

  • TestVsRef.down.txt: contains only significant down-regulated features, i.e. less expressed in Test than in Ref;

  • TestVsRef.up.txt: contains only significant up-regulated features i.e. more expressed in Test than in Ref.

For more information of how to interpret these figures, files, troubleshooting and FAQ please refer here