evolinc

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to kchougul@cshl.edu. Thank you.

Rationale and background:

Ballgown: Ballgown bridges the gap between transcriptome assembly and expression analysis

Alyssa C Frazee,^1,² Geo Pertea,^2,³ Andrew E Jaffe,^1,^2,⁴ Ben Langmead,^1,^2,⁵ Steven L Salzberg,^1,^2,^3,⁵ and Jeffrey T Leek^1,²

Nat Biotechnol. 2015 Mar; 33(3): 243–246.

Ballgown⁹ takes the transcripts and expression levels from StringTie and applies rigorous statistical methods to determine which transcripts are differentially expressed between two or more experiments.Ballgown includes plotting tools as part of the R/Bioconductor package that help visualize the results, it uses abundance data produced by StringTie to perform differential expression analysis at gene, transcript, exon or junction level. It does both time series and fixed condition differential expression analysis.

The required inputs for ballgown are the *ctab files generated for each sample by StringTie.

We have Ballgown also available as R Shiny web app for interactive visualization of the differential expressed results on the SciApps analysis platform. We will use the tar.gz output from the DE Ballgown app to visualize in Sciapps

Ballgown software (https://github.com/alyssafrazee/ballgown)

Pre-Requisites

A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Before using the Ballgown R package, a few preprocessing steps are necessary:
1. RNA-Seq reads should be aligned to a reference genome.
2. A transcriptome should be assembled, or a reference transcriptome should be downloaded.
3. Expression for the features (transcript, exon, and intron junctions) in the transcriptome should be estimated in a Ballgown readable format.
Two sample pipelines for preprocessing are as follows:
1. Pipeline 1: TopHat2 + Stringtie
2. Pipeline 2: TopHat2 + Cufflinks + Tablemaker
3. Both the above pipelines give Ballgown readable format outputs
Mandatory arguments
1. Directory of ctab files from stringtie output
2. design matrix file e.g
  ID      group   reps
  IS22330_DS_1_.sorted    tol     1
  IS22330_DS_2_.sorted    tol     2
  IS22330_DS_3_.sorted    tol     3
  IS20351_DS_1_.sorted    sen     1
  IS20351_DS_2_.sorted    sen     2
  IS20351_DS_3_.sorted    sen     3
3. covariate of the experiment: covariate of interest e.g case/control, status or time. The name should match the cloumn name in the desgin_matrix file; in this example its "group"

Test/sample data

The following test data are provided for testing Ballgown in here -/iplant/home/shared/iplantcollaborative/example_data/Ballgown/Ballgown_condor_app:

Results

Successful execution of the Ballgown will create a directory named output. The directory will contain the following files:

Rplots.pdf- Boxplot of FPKM distribution of each smaple
results_gene.tsv- Gene level Differential expression with no filtering
results_gene_filter.sig.tsv- Identify genes with p value < 0.05
results_gene_filter.tsv- Filter low-abundance genes, here we remove all genes with a variance across samples less than one
results_trans.tsv-transcript level Differential expression with no filtering
results_trans_filter.sig.tsv- Identify transcripts with p value < 0.05
results_trans_filter.tsv-Filter low-abundance genes, here we remove all transcript with a variance across samples less than one