Cuffcompare 2.2.1

This App runs Cuffcompare (version 2.2.1) to

Compare your assembled transcripts to a reference annotation
Track Cufflinks transcripts across multiple experiments (e.g. across a time course)

App Creator

Amanda Cooksey

Info

title	Update

Cufflinks has been recently updated to cufflinks 2.2.1 by Upendra Devisetty (upendra@cyverse.org)

Quick Start

...

Quick Start

Cuffcompare 2.2.1 takes Cufflinks’ GTF output as input, and optionally can take a “reference” annotation

Resources: http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/

Test Data

Info
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplant_training iplantcollaborative -> introexample_rna-seq data -> 02_tophatcuffcompare

Input File(s)

Use the accepted_hits.bam files from the hy5 and WT rep1 and rep2 directories flower4_transcripts.gtf and flower6-7_transccripts.gtf files from the cuffcompare directory for an example run. Notes below are directly from Cufflinks user manual.

General example of input file type is the following:
_s6.25mer.txt-913508 16 chr1 4482736 255 14M431N11M * 0 0 _
CAAGATGCTAGGCAAGTCTTGGAAG IIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 XS:A:-

Note the use of the custom tag XS. This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it must be present for all spliced alignment records (those with a 'N' operation in the CIGAR string).

The SAM file supplied to Cufflinks must be sorted by reference position. If you aligned your reads with TopHat, your alignments will be properly sorted already. If you used another tool, you may want to make sure they are properly sorted as follows:

sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted

A reference annotation file is not required but can be supplied. A custom reference annotation can be supplied by the user in the 'Custom Annotation File' field or a reference annotation can be selected from the drop-down menu under 'Reference Annotation File'. For the example data choose Arabidopsis thaliana (Ensembl 14) from the 'Reference Annotation File' drop down menu.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the section below.

...

Leave all other parameters as default.

Output File(s)

Cufflinks Cuffcompare produces 3 4 main output files (notes directly from Cufflinks Cuffcompare user manual):
1) transcripts<outprefix>_combined.gtf

This GTF file contains Cufflinks' assembled isoforms. The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized ("gene_id", and "transcript_id"). There is one GTF record per row, and each record represents either a transcript or an exon within a transcript.

2) isoforms.fpkm_tracking

This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format. Note, however that as there is only one sample, the "q" format is not used.

3) genes.fpkm_tracking

This file contains the estimated gene-level expression values in the generic FPKM Tracking Format. Note, however that as there is only one sample, the "q" format is not used.Cuffcompare reports a GTF file containing the “union” of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined gtf.

2) <cuff_in>.refmap

This tab delimited file lists the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript,

3) <cuff_in>.tmap

This tab delimited file lists the most closely matching reference transcript for each Cufflinks transcript. There is one row per Cufflinks transcript.

4) <outprefix>.tracking

This file matches transcripts up between samples. Each row contains a transcript structure that is present in one or more input GTF files. Because the transcripts will generally have different IDs (unless you assembled your RNA-Seq reads against a reference transcriptome), cuffcompare examines the structure of each the transcripts, matching transcripts that agree on the coordinates and order of all of their introns, as well as strand.

In the directory Community Data -> iplant_training -> intro_rna-seq -> 03_cufflinks, you will see directories for each of the selected bam files used as inputs. These directories also contain a "skipped.gtf" file.

Tool Source for App

httpshttp://github.com/cole-trapnell-lab.github.io/cufflinks/cuffcompare/
- Customized script to call the Cufflinks Cuffcompare binary has been drafted by Sheldon McKayAmanda Cooksey, thus not all advanced options available are exposed.

Versions Compared

Old Version 1

New Version Current

Key

Cuffcompare 2.2.1

App Creator

Quick Start

Quick Start

Test Data

Input File(s)

Parameters Used in App

Output File(s)

Related Tutorials

Tool Source for App

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

App Creator

Quick Start

Quick Start

Test Data

Input File(s)

Parameters Used in App

Output File(s)

Related Tutorials

Tool Source for App