UARK RNA-Seq Tutorial Copy
RNA-Seq using the Discovery Environment
Rationale and background:
RNA-seq refers to whole transcriptome shotgun sequencing of cDNA, generally using an ultra-high-throughput ("next-generation") sequencing technology. The value of RNA-seq is the ability generate deep-coverage information about a sample's RNA. This can be used for a variety of purposes such as transcriptome assembly, gene discovery/annotation, and detecting differential transcript abundances between tissues, developmental stages, genetic backgrounds, and environmental conditions. In this example we will compare gene transcript abundance between wild type and mutant Arabidopisis strains.
LONG HYPOCOTYL 5 (HY5) is a basic leucine zipper transcription factor (TF) that functions downstream of multiple families of photoreceptors. Mutations in the HY5 gene cause a myriad of aberrant phenotypes in Arabidopsis, including elongated hypocotyl, reduced accumulation of pigments, halted chloroplast development in greening hypocotyls, altered root morphology and defective hormonal and stimulus responses. HY5 thus acts as an integrator that links various gene networks to coordinate plant development.
We will use RNAseq to compare expression levels for genes between WT and hy5- samples and to identify new transcripts or isoforms. In this tutorial, we will use data stored at the NCBI Sequence Read Archive to simulate us having done the lab work ourselves.
- Align the data to the Arabidopsis genome using TopHat
- Identify differentially-expressed genes using CuffDiff
- Use tools in the Discovery Environment to explore the differential gene expression data.
If you do not have an account, please see one of the on-site iPlant staff for a temporary account.
Specific Objectives
By the end of this module, you should
- Be more familiar with the DE user interface
- Understand the starting data for RNA-seq analysis
- Be able to align short sequence reads with a reference genome in the DE
- Be able to analyze differential gene expression in the DE and Atmosphere
Note on Staged Data:
Several of the methods in this tutorial can take 1 to 2 hrs to complete on a full-sized data set. So that you can complete the tutorial in the allotted time, we have pre-staged input and output files in the 'Community Data' folder for each step. You can start your analyses then skip to the next step using pre-staged data. Also, we have filtered the original read data (downloaded from NCBI's Sequence Read Archive) to generate fastq files containing pseudo-replicates derived from a random subset of each library.
Original data from NCBI Sequence Read Archive study SRP003951 (all are derived from Illumina singe-end reads):
- SRR70570 WT RNA-seq replicate 1
- SRR70571 WT RNA-seq replicate 2
- SRR70572 hy5 RNA-seq replicate 1
- SRR70573 hy5 RNA-seq replicate 2
The Staged Fastq Data can be found in the
'Community Data'->iplant_training->intro_rna-seq->01_input_data
Section 1: Align RNA-seq reads using Tophat
Tophat is a specialized alignment software for RNA-seq reads that is aware of splice junctions when aligning to a reference assembly. It can be found by clicking on the Apps icon. The most recent App for Tophat will accept multiple fastq files.
a) Navigate to the Apps window by clicking on the icon. Look for TopHat2-SE (TopHat2 for Illumina paired-end reads) in the 'Public Applications' category under NGS->transcriptome_profiling->Tuxedo RNA-Seq 2. Click on the name of the App to open it.
b) In the Input Data section, you can click 'Add' at the top right of the FASTQ File(s) box to navigate to the folder containing the FASTQ files:
'Community Data'->iplant_training->intro_rna-seq->01_input_data
Note: For multiple input files, it is convenient to open a separate data window, navigate to the above folder and drag all of files into the input data box. Each of the input files will be processed independently. For convenience, a batch of FASTQ files can be analyzed together but these files can also be processed concurrently in independent TopHat runs.
c) In the Ref. Genome section, Use the drop-down menu to select "Arabidopsis thaliana (Ensembl 14)" for the reference genome. This is the TAIR10 release.
d) In the Ref. Annotations section, use the drop-down menu to select "Arabidopsis thaliana (Ensembl 14)" for the reference gene annotations.
e) Start the analysis by clicking the Launch Analysis button, naming the analysis 'tophat' in the dialog box.
Reasonable default options are provided for the analysis settings.
f) Tophat will require some time to complete its work on each sample. Click on the Analyses icon to view the status of your submitted analysis.
g) When the analysis is complete, navigate to the results under your 'analyses' folder. The principle outputs from TopHat are BAM files, each set of alignments is in its own folder, corresponding to the name of the original Fastq file. This is the most time-consuming step of this training module. If your analysis has not completed in time, you can skip ahead by using the pre-computed results:
'Community Data'->iplant_training->intro_rna-seq->02_tophat
In the tophat_out folder, you should see these folders:
Section 2: Assemble transcripts with Cufflinks
Cufflinks assembles RNA-Seq alignments into a parsimonious set of transcripts, then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. A detailed manual can be found at http://cole-trapnell-lab.github.io/cufflinks/manual/. We will be using a reference annotation file for '''Arabidopsis thaliana''' to perform Reference annotation based assembly (RABT), which facilitates discovery of known and novel transcripts. For further discussion, see http://cole-trapnell-lab.github.io/cufflinks/papers/
a) Click on the Apps icon and find Cufflinks2. Open it.
b) In the 'Select Input Data' section, open a separate data window for the 'bam' folder from tophat_out (above) and select and drag all four bam files into the input box. For convenience, a batch of TopHat bam files can be analyzed together but these files can also be processed concurrently in independent Cufflinks runs.
c) In the Reference Sequence section, select Arabidopsis thaliana from the pull-down menu.
d) Run Cufflinks. When it is complete, you will see the following outputs in cufflinks_out
The outputs of cufflinks are GTF and FPKM files. For example, look in one of the four output folders:
Examine the GTF file, transcripts.gtf. This file contains annotated transcripts assembled by cufflinks, using the annotated transcripts selected from the pulldown menu in the Cufflinks app as a guide.
1 Cufflinks transcript 3631 5899 1 + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 3631 3913 1 + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 3996 4276 1 + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 4486 4605 1 + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 4706 5095 1 + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "4"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 5174 5326 1 + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "5"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; 1 Cufflinks exon 5439 5899 1 + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "6"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
Examine the FPKM file, genes.fpkm_tracking. This files gives the normalized abundance of each transcript, expressed in fragments per kilobase of exon per million fragments mapped (FPKM) and their upper and lower 95% confidence interval.
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status AT1G01010 - - AT1G01010 ANAC001 - 1:3630-5899 - - 0 0 0 OK AT1G01020 - - AT1G01020 ARV1 - 1:5927-8737 - - 6.22574 1.51952 10.9319 OK AT1G01030 - - AT1G01030 NGA3 - 1:11648-13714 - - 0.921783 0 2.22538 OK AT1G01040 - - AT1G01040 DCL1 - 1:23145-31227 - - 0.129701 0 0.389104 OK AT1G01050 - - AT1G01050 AtPPa1 - 1:31169-33153 - - 6.08883 1.11732 11.0603 OK AT1G01046 - - AT1G01046 MIR838A - 1:28499-28706 - - 0 0 0 OK AT1G01070 - - AT1G01070 AT1G01070 - 1:38751-40944 - - 1.92728 0 4.75919 OK AT1G01073 - - AT1G01073 AT1G01073 - 1:44676-44787 - - 0 0 0 OK
Section 3: Merge all Cufflinks transcripts into a single transcriptome annotation file using Cuffmerge
Cuffmerge merges together several Cufflinks assemblies. It handles also handles running Cuffcompare. The main purpose of this application is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. A merged, empirical annotation file will be more accurate than using the standard reference annotation, as the expression of rare or novel genes and alternative splicing isoforms seen in this experiment will be better reflected in the empirical transcriptome assemblies. For an informative discussion on this topic, see http://seqanswers.com/forums/showthread.php?t=16422.
a) Open the Cuffmerge2 app. Under 'Input Data', browse to the results of the cufflinks analyses (cufflinks_out). Select the four GTF files in cufflinks_out/gtf as input for Cuffmerge2.
b) Under Reference Data, select the Arabidopsis genome annotation and genome sequence using the pull-down menus.
c) Run the App. When it is complete, you should see the following outputs under cuffmerge_out:
Section 4: Compare expression analysis using CuffDiff
Cuffdiff is a program that uses the Cufflinks transcript quantification engine to calculate gene and transcript expression levels in more than one condition and test them for significant differences. Comprehensive documentation of this tool is given through the Nature Protocols paper linked at http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html.
a) Open a separate data window for the bam files generated by tophat (above).
b) In Apps select CuffDiff2
c) Enter the name of your first Sample (WT).
d) Add BAM files all replicates from this sample, either from your tophat output directory or the pre-staged data. Ignore the BAM index (.bai) files. Cuffdiff merges the replicate data and uses comparison between replicates as part of calculating the test statistic.
d) Repeat steps (b) and (c) for the second sample (hy5)
e) Next, open the Reference Data section and add a custom reference annotation file using the "merged_with_ref_ids.gtf" file from the cuffmerge output folder.
f) Launch the analysis
g) Examine the output files.
The folder cuffdiff_out contains the output files from cuffdiff.
The folder graphs contains some basic plots produced using the R package cummeRbund (http://compbio.mit.edu/cummeRbund/).
The folder sorted_data contains some summary documents, suitable for import into spreadsheets, that sort gene and isoform expression by fold-change or total expression.
CuffDiff Outputs
After a successful run you will find many output files under cuffdiff_out. We will focus on four files related to differential gene expression at the gene and transcript levels:
- genes.fpkm_tracking Transcript abundance at the gene level in units of fragments per Kb of exon per million mapped reads (FPKM)
- gene_exp.diff Results of differential expression testing between the two samples, WT and hy5, at the gene level
- isoforms.fpkm_tracking Transcript abundance at the transcript isoform level in units of fragments per Kb of exon per million mapped reads (FPKM)
- isoform_exp.diff Results of differential expression testing between the two samples, WT and hy5, at the transcript isoform level
Explanation of Outputs:
gene_exp.diff contains gene FPKM values for the WT and hy5 samples and ln(fold change) values.
Statistical significance is given by the p_value and q_value and a determination of whether the there is significant differential expression is given in the last column (with values of 'yes' or 'no'). The q-value is a corrected p-value for multiple tests based on the user supplied false discovery rate (FDR: default 0.05).
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant AT1G01010 AT1G01010 ANAC001 1:3630-5899 WT hy5 NOTEST 0 1.68901 1.79769e+308 1.79769e+308 0.0786496 1 no AT1G01020 AT1G01020 ARV1 1:5927-8737 WT hy5 OK 7.00799 8.85126 0.336881 -0.405048 0.685442 0.816558 no AT1G01030 AT1G01030 NGA3 1:11648-13714 WT hy5 NOTEST 1.44458 1.10952 -0.380717 0.22392 0.822819 1 no AT1G01040 AT1G01040 DCL1 1:23145-33153 WT hy5 NOTEST 0.312212 3.04201 3.28442 -2.31653 0.0205292 1 no AT1G01046 AT1G01046 MIR838A 1:23145-33153 WT hy5 NOTEST 0 0 0 0 1 1 no AT1G01050 AT1G01050 AtPPa1 1:23145-33153 WT hy5 OK 12.4652 24.0248 0.946618 -1.48078 0.138664 0.34761 no AT1G01060 AT1G01060 LHY 1:33378-37871 WT hy5 OK 5.2891 8.01668 0.599983 -0.961352 0.336375 0.55581 no AT1G01070 AT1G01070 AT1G01070 1:38751-40944 WT hy5 NOTEST 1.99104 3.22537 0.695944 -0.355638 0.722111 1 no AT1G01073 AT1G01073 AT1G01073 1:44676-44787 WT hy5 NOTEST 0 0 0 0 1 1 no AT1G01080 AT1G01080 AT1G01080 1:45295-47019 WT hy5 OK 15.4469 20.5669 0.413009 -0.750314 0.453065 0.647742 no AT1G01090 AT1G01090 PDH-E1 ALPHA 1:47484-49286 WT hy5 OK 71.0777 41.2999 -0.783261 2.9202 0.00349811 0.0284027 yes
Section 5: Examine the gene expression data
Below is an example of some differentially regulated genes from the output file genes.sorted_by_expression.sig.txt, sorted by total expression levels (hy5 + WT transcripts). Note the preponderance of down-regulated genes in hy5.
Below is another example of isoform (transcript)-level expression, this time sorted by fold-change. The source file is transcripts.sorted_by_fold.sig.txt.
b) Click on the WT_hy5_volcano_plot.png under graphs in the cuffdiff results. Note that there are a lot more significant values on left side of the graph (down-regulated genes).
RNA-Seq in Atmosphere
Section 1: Connect to an instance of an Atmosphere Image (virtual server)
1. Organize into groups of 2-3 people. Go to https://atmo.iplantcollaborative.org and log in with IPLANT TEST USER CREDENTIALS.
2. There should already be a VM named RNA-Seq Visualization launched and ready for you to connect to. The screen should resemble this one:
5. Launch the standalone VNC Viewer application on your desktop computer and enter YOUR VM's IP address following by :1 (The IP address is provided in the "My Instances" listing at the left, just underneath the name of the instance you are launching, e.g. 128.196.142.97 in this example.)
Once you have connected, you will see your virtual Desktop, a Linux server running in the Atmosphere cloud computing platform. You will interact with it the same way as you would a physical computer.
Section 2: Visualize TopHat outputs in the Integrated Genome Viewer (IGV).
TopHat output data from an RNA-Seq genome alignment using the full SRA accession SRR70570 from the above example data set has been pre-staged in the TopHat folder on your virtual desktop. The folder contains the alignment data in a bam file (SRR70570_WT1.bam) and split-read (intron/exon) junction data in the file junctions.bed.
1. Open IGV by double-clicking on the IGV icon on your dekstop. It will take a moment to load. Do not click the icon again if nothing appears to happen for a few seconds. This implementation of IGV is configured to load the Arabidopsis thaliana genome (TAIR10) by default.
2. Find the File->Load From File menu.
3. Navigate to tophat_out folder on your desktop and select the BAM and BED files. IGV will load the index file automatically, you do not need to select it also. Click OK. This will your TopHat BAM alignment files into IGV.
4. Maximize the IGV window. You should be looking at a whole genome summary view. You can focus on a chromosome by clicking on the chromosome name and zoom in to particular regions by drag-selecting on the ruler or using the navigation tool in the top-right corner of the window.
5. To focus in on a gene, enter Chr1:25048000-25049500. This will zoom to the highly expressed gene RBCS1A (ribulose bisphosphate carboxylase small chain 1A). Note the read coverage and the splicing and other data produced by TopHat. The alignments and splicing data were used by Cufflinks to assemble the transcripts.
6. Zoom to other areas of the genome to find other examples of expressed genes.
Section 3: Get started with R and cummeRbund
Notes on the command line interface
We will be working on the command line for parts of this tutorial
- Attention to detail is critical. The most common errors we encounter are:
- typos
- case sensitivity
inappropriate spaces, for example, never put a space between the '-' and the following command line flag. For example
-i
is not the same as
- i
- unbalanced quotes and brackets. For example, If you open a parenthesis with (, always make sure to close with ). While typing, if you always balance your quotes and brackets first, then insert the contents, you can avoid these common errors.
- Never hit the Enter key until you are finished writing the whole command
Cuffdiff output data generated as described in the discovery environment are pre-staged in the cuffdiff_out folder on your desktop.
1. Open a terminal window by double clicking on the Terminal icon on your desktop.
2. Navigate to your Cuffdiff output data
cd Desktop/cuffdiff_out
3. View the contents of the folder with the ls command
$ ls cds.count_tracking isoforms.count_tracking cds.diff isoforms.fpkm_tracking cds.fpkm_tracking isoforms.read_group_tracking cds.read_group_tracking promoters.diff cds_exp.diff read_groups.info cuffData.db run.info gene_exp.diff splicing.diff genes.count_tracking tss_group_exp.diff genes.fpkm_tracking tss_groups.count_tracking genes.read_group_tracking tss_groups.fpkm_tracking isoform_exp.diff tss_groups.read_group_tracking
4. Launch R
$ R R version 2.15.3 (2013-03-01) -- "Security Blanket" Copyright (C) 2013 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
This is the R interactive shell, which we will use to enter commands specific to R.
5. Load the cummeRbund package
> library(cummeRbund)
6. Check the version
> citation("cummeRbund") To cite package 'cummeRbund' in publications use: L. Goff, C. Trapnell and D. Kelley (2012). cummeRbund: Analysis, exploration, manipulation, and visualization of Cufflinks high-throughput sequencing data.. R package version 2.0.0. A BibTeX entry for LaTeX users is @Manual{, title = {cummeRbund: Analysis, exploration, manipulation, and visualization of Cufflinks high-throughput sequencing data.}, author = {L. Goff and C. Trapnell and D. Kelley}, year = {2012}, note = {R package version 2.0.0}, } ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")' .
7. Load the cuffdiff data into a database named cuff
> cuff <- readCufflinks(rebuild = TRUE)
8. Have a look at the database summary:
> cuff CuffSet instance with: 2 samples 33714 genes 43481 isoforms 35113 TSS 32924 CDS 33621 promoters 35113 splicing 27350 relCDS
Section 4: Plot data
The squared coefficient of variation is a normalized measure of cross-replicate variability that can be useful for evaluating the quality your RNA-seq data. Run an SCV plot.
> fpkmSCVPlot(genes(cuff))
Draw a density plot of genes in the two samples.
> dens <- csDensity(genes(cuff))
This will load the results of the analysis into an object named dens. You can now plot the graph.
> dens
You can skip directly to the graph like so:
>csDensity(genes(cuff))
Draw a scatter plot. This command also requires X and Y arguments, 'WT' and 'hy5', respectively.
> csScatter(genes(cuff), 'WT', 'hy5',smooth=T)
Draw a Volcano matrix plot.
> csVolcanoMatrix(isoforms(cuff), 'WT', 'hy5')
Save a Volcano matrix plot to a PNG file
> png('../volcano_plot.png') > csVolcanoMatrix(isoforms(cuff), 'WT', 'hy5') > dev.off()
Section 5: Explore the data at the gene level
how many significantly up or down regulated genes are there at various thresholds?
> sig <- getSig(cuff, alpha=0.05, level='genes') > length(sig) [1] 5701 > sig <- getSig(cuff, alpha=0.01, level='genes') > length(sig) [1] 4189 > sig <- getSig(cuff, alpha=10E-8, level='genes') > length(sig) [1] 1235 > sig <- getSig(cuff, alpha=0, level='genes') > length(sig) [1] 466
Have a look at the last 100 genes with a q-value (corrected p-value) of 0. Do you notice anything about the last few genes names?
> tail(sig, 100) [1] "AT4G38250" "AT4G38680" "AT4G38770" "AT4G38920" "AT4G39200" [6] "AT4G39363" "AT4G40040" "AT5G01530" "AT5G02380" "AT5G02500" [11] "AT5G02960" "AT5G03240" "AT5G03545" "AT5G03850" "AT5G04140" [16] "AT5G04290" "AT5G05370" "AT5G06640" "AT5G07322" "AT5G08450" [21] "AT5G10450" "AT5G10960" "AT5G11740" "AT5G12140" "AT5G13930" [26] "AT5G14040" "AT5G14120" "AT5G14320" "AT5G16250" "AT5G17820" [31] "AT5G17870" "AT5G18380" "AT5G19120" "AT5G19220" "AT5G19510" [36] "AT5G20290" "AT5G20630" "AT5G20790" "AT5G21020" "AT5G21940" [41] "AT5G22580" "AT5G22640" "AT5G23820" "AT5G24165" "AT5G24490" [46] "AT5G24930" "AT5G25630" "AT5G26570" "AT5G27330" "AT5G27660" [51] "AT5G27700" "AT5G27770" "AT5G27850" "AT5G28540" "AT5G29890" [56] "AT5G35190" "AT5G40450" "AT5G41471" "AT5G41700" "AT5G41790" [61] "AT5G42110" "AT5G42530" "AT5G42650" "AT5G44020" "AT5G44190" [66] "AT5G44580" "AT5G45350" "AT5G45510" "AT5G47020" "AT5G47110" [71] "AT5G47560" "AT5G47700" "AT5G47890" "AT5G49440" "AT5G50920" [76] "AT5G52310" "AT5G52650" "AT5G53560" "AT5G54270" "AT5G54770" [81] "AT5G55660" "AT5G57290" "AT5G59030" "AT5G59690" "AT5G59850" [86] "AT5G60390" "AT5G60850" "AT5G63530" "AT5G64040" "AT5G64840" [91] "AT5G66040" "AT5G66570" "AT5G67030" "AT5G67070" "AT5G67360" [96] "XLOC_008325" "XLOC_011655" "XLOC_013774" "XLOC_033160" "XLOC_033162" >
Get the gene data from the cuff database.
> sigGenes <- getGenes(cuff,sig) Getting gene information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting isoforms information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting CDS information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting TSS information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting promoter information: distData Getting splicing information: distData Getting relCDS information: distData >
Examine the gene set.
> sigGenes CuffGeneSet instance for 468 genes Slots: annotation fpkm repFpkm diff count isoforms CuffFeatureSet instance of size 664 TSS CuffFeatureSet instance of size 470 CDS CuffFeatureSet instance of size 574 promoters CuffFeatureSet instance of size 468 splicing CuffFeatureSet instance of size 470 relCDS CuffFeatureSet instance of size 468 >
Note that there are more isoforms (transcripts) than genes in the set.
Section 6: More Visual Analytics
Let's see what the scatter plot looks with this subset of genes
> csScatter(sigGenes, 'WT', 'hy5')
We can also do a heat map plot of these genes. First, let's take a smaller subset.
> sigGenes <- getGenes(cuff,tail(sig,100))
.
Clustered the genes by similar expression profiles.
> csHeatmap(sigGenes, cluster='both')
This graph is too dense, we can stretch it vertically using a PNG device.
> png(filename = 'thin_heatmap.png', width = 400, height = 1000, units = 'px') > csHeatmap(sigGenes, cluster='both') > dev.off()
Finally, consider gene FP3, we can find other genes in the database with similar expression patterns:
> sim = findSimilar(cuff,'FP3',n=20) > sim CuffGeneSet instance for 20 genes Slots: annotation fpkm repFpkm diff count isoforms CuffFeatureSet instance of size 30 TSS CuffFeatureSet instance of size 23 CDS CuffFeatureSet instance of size 26 promoters CuffFeatureSet instance of size 20 splicing CuffFeatureSet instance of size 23 relCDS CuffFeatureSet instance of size 20
Now, plot the expression pattern(s)
> expressionPlot(sim,logMode=T,showErrorbars=F)