StringTie-1.3.3_merge

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to kchougul@cshl.edu. Thank you.

Rationale and background:

 

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

Mihaela Pertea,    Geo M Pertea,    Corina M Antonescu,    Tsung-Cheng Chang,    Joshua T Mendell    & Steven L Salzberg

 doi:10.1038/nbt.3122

 

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.)


Version: 1.3.3


If StringTie is run with the --merge option, it takes as input a list of GTF/GFF files and merges/assembles these transcripts into a non-redundant set of transcripts. This step creates a uniform set of transcripts for all samples to facilitate the downstream calculation of differentially expressed levels for all transcripts among the different experimental conditions. Output is a merged GTF file with all merged gene models, but without any numeric results on coverage, FPKM, and TPM. Then, with this merged GTF, StringTie can re-estimate abundances by running it again with the -e option on the original set of alignment files 

Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Mandatory arguments -
    1. Input gtf files to merge: (in gtf format)
  3. Optional arguments
    1. Annotation: provide gtf file or select from the list (a reference annotation file in GTF/GFF3 format can be provided to StringTie)
    2. output file name for the merged transcripts GTF:merged.out.gtf(output file name for the merged transcripts GTF)
    3. minimum isoform fraction:0.1(Sets the minimum length allowed for the predicted transcripts)
    4. minimum assembled transcript length:50(minimum input transcript length to include in the merge)
    5. minimum reads per bp coverage to consider for transcript assembly:0 (minimum input transcript coverage to include in the merge)
    6. minimum input transcript FPKM to include in the merge: 0 (minimum input transcript FPKM to include in the merge)
    7. minimum input transcript TPM to include in the merge : 0 (minimum input transcript TPM to include in the merge)
    8. keep merged transcripts with retained introns:uncheck (keep merged transcripts with retained introns)


Test/sample data 

The following test data are provided for testing StringTie1.3.3_merge in here - /iplant/home/shared/iplantcollaborative/example_data/StringTie/StringTie1.3.3_merge:

  1. reference gtf file- reference.gtf
  2. Directory of gtf files in (.bam format sorted by their genomic location) -
    1.  sample1.gtf
    2. sample2.gtf

Run StringTie1.3.3_merge on gtf files using reference file.

Results 

Successful execution of the StringTie1.3.3_merge will contain several files and directories:

  1. merged.out.gtf: merged GTF file with all merged gene models



More information on the tool can be found here - https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual