The CyVerse App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01').
Also, as part of the 2.8 app categorization, a number of apps were deprecated and are no longer available, and there is no longer an Archive category. You can search for a suitable replacement in the List of Applications in this window, or search on an app name or tool used for an app in the Apps window search field. If you need an app reinstated, please contact email@example.com.
Please work through the documentation and add your comments on the bottom of this page, or email comments to firstname.lastname@example.org. Thank you.
TACO: Multi-sample transcriptome assembly from RNA-Seq
Transcriptome assemblers reconstruct full-length transcripts from the short sequence fragments generated by RNA-Seq. Large consortia such as TCGA, ICGC, GTex, ENCODE, the Cancer Cell Line Encyclopedia (CCLE), and others have performed RNA-Seq on thousands of human tissues and cell lines, providing an unparalleled resource for investigating transcriptional diversity and complexity. Transcriptome Assemblies Combined into One (TACO), an algorithm that reconstructs a consensus transcriptome from a collection of individual assemblies. TACO employs change point detection to break apart complex loci and correctly delineate transcript start and end sites, and a dynamic programming approach to assemble transcripts from a network of splicing patterns. TACO vastly outperforms existing software tools such as Cuffmerge and Stringtie merge.
Niknafs, Y. S., Pandian, B., Iyer, H. K., Chinnaiyan, A. M. & Iyer, M. K. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat. Methods 14, 68–70 (2017)
- GTF attribute field: GTF attribute field containing expression estimate. The default setting is
FPKM for Cufflinks GTF input.
- Filter min length: Pre-filters input transfrags with
length < N prior to assembly. Set to
0 to disable this filter.Filter min length
- Filter max expr: Pre-filters input transfrags with
expression < X. The units of the expression cutoff value
X correspond to the units specified by the
gtf-expr-attr parameter, which is
FPKM by default. Set to
0.0 to disable this filter.
- Isoform fraction: Report transcript isoforms with
expression fraction >=FRAC relative to the highest expressed gene. For each gene, the highest abundance isoform will be reported with a
All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:
Community Data > iplantcollaborative > example_data > taco (/iplant/home/shared/iplantcollaborative/example_data/taco)
- Input list file:
Make sure both the Inputs files and input list file is in the same directory
Leave all the parameters default
TACO writes output to the directory specified by the
-o command line option. Within this directory, the import output files are:
Transcriptome assembly: assembly.gtf
This GTF file contains TACO's assembled isoforms. The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized (“gene_id”, and “transcript_id”). There one GTF record per row, and each record represents either a transcript or an exon within a transcript. The columns are defined as follows:
|Column number||Column name||Example||Description|
|1||seqname||chrX||Chromosome or contig name|
|2||source||taco||The name of the program that generated this file (always taco)|
|3||feature||exon||The type of record (always either “transcript” or “exon”.|
|4||start||77696957||The leftmost coordinate of this record (where 1 is the leftmost possible coordinate)|
|5||end||77712009||The rightmost coordinate of this record, inclusive.|
|6||score||77712009||The most abundant isoform for each gene is assigned a score of 1000. Minor isoforms are scored by the ratio (minor FPKM/major FPKM)|
|7||strand||+||TACO's guess for which strand the isoform came from. Always one of “+”, “-“, “.”|
|7||frame||.||TACO does not predict where the start and stop codons (if any) are located within each transcript, so this field is not used.|
Each GTF record is decorated with the following attributes:
|gene_id||G7||TACO gene id|
|transcript_id||TU56||TACO transcript id|
|locus_id||L1||TACO locus id|
|tss_id||TSS31||TACO transcription start site id|
|expr||2.441||Isoform-level abundance. The units correspond to the expression units of the input transfrags (usually FPKM or TPM)|
|rel_frac||0.7647||Relative abundance of isoform compared to the major isoform in the gene. The most abundant isoform for each gene is assigned a rel_frac of 1.0. Minor isoforms are scored by the ratio (minor expr/major expr)|
|abs_frac||0.7647||Relative abundance of isoform compared to the total expression of all isoforms in the gene. Isoforms are scored by the ratio (expr / sum(expr(x) for each isoform x)).|
Transcriptome assembly: assembly.bed
This BED file contains TACO's assembled isoforms. Please refer to the UCSC genome browser's detailed description of the BED format. The
name column (Column 4) contains a string of the format
Transfrag coverage profiles: expr.pos.bedgraph, expr.neg.bedgraph, expr.none.bedgraph
TACO outputs 3 bedGraph files with the coverage profile of the input transfrags on the forward, reverse, and unknown/unspecified strands. Please refer to the UCSC genome browser's detailed description of the bedGraph format. These files can be converted to bigWig format using the free conversion tool
bedGraphToBigWig for viewing on genome browsers such as IGV or UCSC.
Transfrag splice junction profiles: splice_junctions.bed
A UCSC BED track of junctions reported by TACO. Each junction consists of two connected BED blocks. The score is the sum of the expression values of transfrags supporting the junction. This file can be converted to bigBed track format for viewing on genome browsers such as IGV or UCSC.
Tool Source for App