Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to email@example.com. Thank you.
Rationale and background:
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads
Mihaela Pertea, Geo M Pertea, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell & Steven L Salzberg
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.)
The app expects the GTF files produced by StringTie to be located inside each sample sub-directory located in the main output directory. It generates two CSV files containing the count matrices for genes and transcripts, using the coverage values found in the output of stringtie -e. This output can be used in differential expression analysis tool like DESeq2 and edegeR
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Mandatory arguments -
- the parent directory of the sample sub-directories: (in gtf format)
- Optional arguments
- the average read length: 75 (the average read length)
- cluster genes that overlap with different gene IDs: uncheck(whether to cluster genes that overlap with different gene IDs )
The following test data are provided for testing StringTie-1.3.3_to_DESeq2_and_edegeRin here - /iplant/home/shared/iplantcollaborative/example_data/StringTie/StringTie-1.3.3_to_DESeq2_and_edegeR:
- Directory of ballgown outputfiles files -
Run StringTie-1.3.3_to_DESeq2_and_edegeR on gtf file in the ballgown directory.
Successful execution of the StringTie-1.3.3_to_DESeq2_and_edegeR will contain several files and directories. It generates two CSV files containing the count matrices for genes and transcripts, using the coverage values found in the output of stringtie -e:
- gene_count_matrix.csv : gene count matrix
- transcript_count_matrix.csv- transcript count matrix
These count matrices (CSV files) can then be imported into R for use by DESeq2 and edgeR (using the DESeqDataSetFromMatrix and DGEList functions, respectively).
More information on the tool can be found here - https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual