Cuffcompare 2.2.1
This App runs Cuffcompare (version 2.2.1) to
- Compare your assembled transcripts to a reference annotation
- Track Cufflinks transcripts across multiple experiments (e.g. across a time course)
App Creator
Amanda Cooksey
Info | ||
---|---|---|
| ||
Cufflinks has been recently updated to cufflinks 2.2.1 by Upendra Devisetty (upendra@cyverse.org) |
Quick Start
...
Quick Start
- Cuffcompare 2.2.1 takes Cufflinks’ GTF output as input, and optionally can take a “reference” annotation
Test Data
Info |
---|
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplant_training iplantcollaborative -> introexample_rna-seq data -> 02_tophatcuffcompare |
Input File(s)
Use the accepted_hits.bam files from the hy5 and WT rep1 and rep2 directories flower4_transcripts.gtf and flower6-7_transccripts.gtf files from the cuffcompare directory for an example run. Notes below are directly from Cufflinks user manual.
General example of input file type is the following:
_s6.25mer.txt-913508 16 chr1 4482736 255 14M431N11M * 0 0 _
CAAGATGCTAGGCAAGTCTTGGAAG IIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 XS:A:-
Note the use of the custom tag XS. This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).
The SAM file supplied to Cufflinks must be sorted by reference position. If you aligned your reads with TopHat, your alignments will be properly sorted already. If you used another tool, you may want to make sure they are properly sorted as follows:
sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
A reference annotation file is not required but can be supplied. A custom reference annotation can be supplied by the user in the 'Custom Annotation File' field or a reference annotation can be selected from the drop-down menu under 'Reference Annotation File'. For the example data choose Arabidopsis thaliana (Ensembl 14) from the 'Reference Annotation File' drop down menu.
Parameters Used in App
When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the section below.
...
Leave all other parameters as default.
Output File(s)
Cufflinks Cuffcompare produces 3 4 main output files (notes directly from Cufflinks Cuffcompare user manual):
1) transcripts<outprefix>_combined.gtf
- This GTF file contains Cufflinks' assembled isoforms. The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized ("gene_id", and "transcript_id"). There is one GTF record per row, and each record represents either a transcript or an exon within a transcript.
2) isoforms.fpkm_tracking
- This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format. Note, however that as there is only one sample, the "q" format is not used.
3) genes.fpkm_tracking
- This file contains the estimated gene-level expression values in the generic FPKM Tracking Format. Note, however that as there is only one sample, the "q" format is not used.Cuffcompare reports a GTF file containing the “union” of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined gtf.
2) <cuff_in>.refmap
- This tab delimited file lists the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript,
3) <cuff_in>.tmap
- This tab delimited file lists the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript.
4) <outprefix>.tracking
- This file matches transcripts up between samples. Each row contains a transcript structure that is present in one or more input GTF files. Because the transcripts will generally have different IDs (unless you assembled your RNA-Seq reads against a reference transcriptome), cuffcompare examines the structure of each the transcripts, matching transcripts that agree on the coordinates and order of all of their introns, as well as strand.
In the directory Community Data -> iplant_training -> intro_rna-seq -> 03_cufflinks, you will see directories for each of the selected bam files used as inputs. These directories also contain a "skipped.gtf" file.
Related Tutorials
Tool Source for App
- httpshttp://github.com/cole-trapnell-lab.github.io/cufflinks/cuffcompare/
- Customized script to call the Cufflinks Cuffcompare binary has been drafted by Sheldon McKayAmanda Cooksey, thus not all advanced options available are exposed.