Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Cuffcompare 2.2.1

This App runs Cuffcompare (version 2.2.1) to 

  • Compare your assembled transcripts to a reference annotation
  • Track Cufflinks transcripts across multiple experiments (e.g. across a time course)

App Creator

Amanda Cooksey

Info
titleUpdate

Cufflinks has been recently updated to cufflinks 2.2.1 by Upendra Devisetty (upendra@cyverse.org)

Quick Start

...

Quick Start

  • Cuffcompare 2.2.1 takes Cufflinks’ GTF output as input, and optionally can take a “reference” annotation 

Test Data

Info

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplant_training iplantcollaborative -> introexample_rna-seq data -> 02_tophatcuffcompare

Input File(s)

Use the accepted_hits.bam files from the hy5 and WT rep1 and rep2 directories flower4_transcripts.gtf and flower6-7_transccripts.gtf  files from the cuffcompare directory for an example run. Notes below are directly from Cufflinks user manual.

General example of input file type is the following:
_s6.25mer.txt-913508 16 chr1 4482736 255 14M431N11M * 0 0 _
CAAGATGCTAGGCAAGTCTTGGAAG IIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 XS:A:-

Note the use of the custom tag XS. This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).

The SAM file supplied to Cufflinks must be sorted by reference position. If you aligned your reads with TopHat, your alignments will be properly sorted already. If you used another tool, you may want to make sure they are properly sorted as follows:

sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted

A reference annotation file is not required but can be supplied. A custom reference annotation can be supplied by the user in the 'Custom Annotation File' field or a reference annotation can be selected from the drop-down menu under 'Reference Annotation File'. For the example data choose Arabidopsis thaliana (Ensembl 14) from the 'Reference Annotation File' drop down menu. 

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the section below.

...

Leave all other parameters as default.

Output File(s)

Cufflinks Cuffcompare produces 3 4 main output files (notes directly from Cufflinks Cuffcompare user manual):
1) transcripts<outprefix>_combined.gtf

  • This GTF file contains Cufflinks' assembled isoforms. The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized ("gene_id", and "transcript_id"). There is one GTF record per row, and each record represents either a transcript or an exon within a transcript.

2) isoforms.fpkm_tracking

  • This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format. Note, however that as there is only one sample, the "q" format is not used.

3) genes.fpkm_tracking

  • This file contains the estimated gene-level expression values in the generic FPKM Tracking Format. Note, however that as there is only one sample, the "q" format is not used.Cuffcompare reports a GTF file containing the “union” of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined gtf.

2)  <cuff_in>.refmap

  • This tab delimited file lists the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript,

3) <cuff_in>.tmap

  • This tab delimited file lists the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript.

4) <outprefix>.tracking

  • This file matches transcripts up between samples. Each row contains a transcript structure that is present in one or more input GTF files. Because the transcripts will generally have different IDs (unless you assembled your RNA-Seq reads against a reference transcriptome), cuffcompare examines the structure of each the transcripts, matching transcripts that agree on the coordinates and order of all of their introns, as well as strand. 

In the directory Community Data -> iplant_training -> intro_rna-seq -> 03_cufflinks, you will see directories for each of the selected bam files used as inputs. These directories also contain a "skipped.gtf" file.

Related Tutorials

RNA-Seq Tutorial (DE 1.8)

Tool Source for App