clusterProfiler
The VICE Quick Start tutorial provides an introduction to VICE, a visual and interactive computing environment for running interactive apps.
Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org.
Rationale and background:
In recent years, high-throughput experimental techniques such as microarray, RNA-Seq and mass spectrometry can detect cellular molecules at systems-level. These kinds of analyses generate huge quantities of data, which need to be given a biological interpretation. A commonly used approach is via clustering in the gene dimension for grouping different genes based on their similarities(Yu et al. 2010).
To search for shared functions among genes, a common way is to incorporate the biological knowledge, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), for identifying predominant biological themes of a collection of genes. The clusterProfiler package implements methods to analyze and visualize functional profiles of genomic coordinates (supported by ChIPseeker), gene and gene clusters.
After clustering analysis, researchers not only want to determine whether there is a common theme of a particular gene cluster, but also to compare the biological themes among gene clusters. The manual step to choose interesting clusters followed by enrichment analysis on each selected cluster is slow and tedious. To bridge this gap, we designed clusterProfiler(Yu et al. 2012), for comparing and visualizing functional profiles among gene clusters.
Pre-Requisites:
+ You will need a CyVerse account to complete this exercise(Register here: https://user.cyverse.org).
+ Discovery Environment access - You will get access to Discovery Environment by default. Check the ‘My Services’ tab; if Discovery Environment is not listed, click the ‘Available’ tab, locate Discovery Environment and click the link to request access.
An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)
Supported Analyses
- Over-Representation Analysis
- Gene Set Enrichment Analysis
- Biological theme comparison
Species Supported:
Name | Database |
---|---|
Human | org.Hs.eg.db |
Mouse | org.Mm.eg.db |
Rhesus | org.Mmu.eg.db |
Chimp | org.Pt.eg.db |
Rat | org.Rn.eg.db |
Yeast | org.Sc.sgd.db |
Pig | org.Ss.eg.db |
Xenopus | org.Xl.eg.db |
Anopheles | org.Ag.eg.db |
Arabidopsis | org.At.tair.db |
Bovine | org.Bt.eg.db |
Worm | org.Ce.eg.db |
Canine | org.Cf.eg.db |
Fly | org.Dm.eg.db |
Zebrafish | org.Dr.eg.db |
E coli strain K12 | org.EcK12.eg.db |
E coli strain Sakai | org.EcSakai.eg.db |
Chicken | org.Gg.eg.db |
1. Launcing ClusterProfiler-VICE App
First log-on CyVerse DE, open the Apps window and search for ‘clusterProfiler’.
- Login to CyVerse DE
- Open the Apps window and search for 'clusterProfiler' Rstudio-VICE app
2. Launch analysis
Launch the app by selecting an example folder and then clicking launch analysis. You can select different input files and/or folder and "Launch App"
Tip: You can use input files to import a script into the app.
Note: You will not see any files when selecting the folder if you selected input by folder. Rest assured that they will be there once the app begins to run.
3. Navigate to clusterProfiler app url
- After the analysis starts running, open your notifications and click on the ‘Access your running Analysis here url’.
- In the new URL, enter ‘rstudio’ for both username and password
5. Write/Run your code
In the Rstudio script section, you can write your code, generate plots, save plots etc. You can find an example script below.
6. Complete your analysis
Complete your analysis by clicking the Analysis window, then select the clusterProfiler analysis and click ‘Complete and Save Outputs’ option under “Analyses” button. After you had done this, you can find the outputs that you generated (if any) using the same steps as before, but this time selecting ‘Go To Output Folder’.
Warning: Currently, VICE can run for 48 hrs beyond which the apps will be terminated. So make sure you run your analysis before 48 hrs.
Example Analysis Script:
Section 1 - Gene Annotation
1. Load required libraries
library("AnnotationDbi")
library(clusterProfiler)
library("org.Xx.eg.db") (Load required species database)
2. Set annotation database
OrgDb <- org.Xx.xx.db
3. Read in your Gene list from file
res <- read.table("GeneList.txt", header=TRUE)
4. Assign GeneId as row names
row.names(res) <- res$Id
5. bitr: Biological Id TranslatoR
gene.df <- bitr(res$Id, fromType = "ENSEMBL",
toType = c("ENTREZID", "SYMBOL"),
OrgDb = OrgDb)
6. Create gene list
gene <- gene.df$ENTREZID
Section 2 - GO & KEGG Analysis
1. GO classification - gene classification based on GO distribution at a specific level.
ggo <- groupGO(gene = gene,
OrgDb = OrgDb,
ont = "BP",
level = 3,
readable = TRUE)
2. GO classification plot
barplot(ggo, drop=TRUE, showCategory=12)
3. GO over-representation test
ego <- enrichGO(gene = gene,
OrgDb = OrgDb,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.05,
readable = TRUE)
4. Barplot visualization of GO enriched genes
barplot(ego, showCategory=25)
5. Dotplot visualization of GO enriched genes
dotplot(ego, showCategory=25)
6. Complex association plot
cnetplot(ego, foldChange=gene)
7. Requires another package topGO
plotGOgraph(ego)
8. KEGG over-representation test
kk <- enrichKEGG(gene = geneDown,
organism = 'hsa',
pvalueCutoff = 0.05)
dotplot(kk, showCategory=15)
Related Resources
- ClusterProfiler Documentation: https://guangchuangyu.github.io/software/clusterProfiler/