evolinc
The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
Ballgown: Ballgown bridges the gap between transcriptome assembly and expression analysis
Nat Biotechnol. 2015 Mar; 33(3): 243–246.
Ballgown9 takes the transcripts and expression levels from StringTie and applies rigorous statistical methods to determine which transcripts are differentially expressed between two or more experiments.Ballgown includes plotting tools as part of the R/Bioconductor package that help visualize the results, it uses abundance data produced by StringTie to perform differential expression analysis at gene, transcript, exon or junction level. It does both time series and fixed condition differential expression analysis.
The required inputs for ballgown are the *ctab files generated for each sample by StringTie.
We have Ballgown also available as R Shiny web app for interactive visualization of the differential expressed results on the SciApps analysis platform. We will use the tar.gz output from the DE Ballgown app to visualize in Sciapps
Ballgown software (https://github.com/alyssafrazee/ballgown)
Pre-Requisites
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Before using the Ballgown R package, a few preprocessing steps are necessary:
- RNA-Seq reads should be aligned to a reference genome.
- A transcriptome should be assembled, or a reference transcriptome should be downloaded.
- Expression for the features (transcript, exon, and intron junctions) in the transcriptome should be estimated in a Ballgown readable format.
- Two sample pipelines for preprocessing are as follows:
- Pipeline 1: TopHat2 + Stringtie
- Pipeline 2: TopHat2 + Cufflinks + Tablemaker
- Both the above pipelines give Ballgown readable format outputs
- Mandatory arguments
- Directory of ctab files from stringtie output
- design matrix file e.g
ID group reps
IS22330_DS_1_.sorted tol 1
IS22330_DS_2_.sorted tol 2
IS22330_DS_3_.sorted tol 3
IS20351_DS_1_.sorted sen 1
IS20351_DS_2_.sorted sen 2
IS20351_DS_3_.sorted sen 3
covariate of the experiment: covariate of interest e.g case/control, status or time. The name should match the cloumn name in the desgin_matrix file; in this example its "group"
The following test data are provided for testing Ballgown in here -/iplant/home/shared/iplantcollaborative/example_data/Ballgown/Ballgown_condor_app:
Results
Successful execution of the Ballgown will create a directory named output. The directory will contain the following files:
- Rplots.pdf- Boxplot of FPKM distribution of each smaple
- results_gene.tsv- Gene level Differential expression with no filtering
- results_gene_filter.sig.tsv- Identify genes with p value < 0.05
- results_gene_filter.tsv- Filter low-abundance genes, here we remove all genes with a variance across samples less than one
- results_trans.tsv-transcript level Differential expression with no filtering
- results_trans_filter.sig.tsv- Identify transcripts with p value < 0.05
- results_trans_filter.tsv-Filter low-abundance genes, here we remove all transcript with a variance across samples less than one