clusterProfiler

The VICE Quick Start tutorial provides an introduction to VICE, a visual and interactive computing environment for running interactive apps.

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org.

Rationale and background:

In recent years, high-throughput experimental techniques such as microarray, RNA-Seq and mass spectrometry can detect cellular molecules at systems-level. These kinds of analyses generate huge quantities of data, which need to be given a biological interpretation. A commonly used approach is via clustering in the gene dimension for grouping different genes based on their similarities(Yu et al. 2010).

To search for shared functions among genes, a common way is to incorporate the biological knowledge, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), for identifying predominant biological themes of a collection of genes. The clusterProfiler package implements methods to analyze and visualize functional profiles of genomic coordinates (supported by ChIPseeker), gene and gene clusters.

After clustering analysis, researchers not only want to determine whether there is a common theme of a particular gene cluster, but also to compare the biological themes among gene clusters. The manual step to choose interesting clusters followed by enrichment analysis on each selected cluster is slow and tedious. To bridge this gap, we designed clusterProfiler(Yu et al. 2012), for comparing and visualizing functional profiles among gene clusters.

Pre-Requisites:

+ You will need a CyVerse account to complete this exercise(Register here: https://user.cyverse.org).

+ Discovery Environment access - You will get access to Discovery Environment by default. Check the ‘My Services’ tab; if Discovery Environment is not listed, click the ‘Available’ tab, locate Discovery Environment and click the link to request access.

An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

Supported Analyses

Over-Representation Analysis
Gene Set Enrichment Analysis
Biological theme comparison

Species Supported:

Name	Database
Human	org.Hs.eg.db
Mouse	org.Mm.eg.db
Rhesus	org.Mmu.eg.db
Chimp	org.Pt.eg.db
Rat	org.Rn.eg.db
Yeast	org.Sc.sgd.db
Pig	org.Ss.eg.db
Xenopus	org.Xl.eg.db
Anopheles	org.Ag.eg.db
Arabidopsis	org.At.tair.db
Bovine	org.Bt.eg.db
Worm	org.Ce.eg.db
Canine	org.Cf.eg.db
Fly	org.Dm.eg.db
Zebrafish	org.Dr.eg.db
E coli strain K12	org.EcK12.eg.db
E coli strain Sakai	org.EcSakai.eg.db
Chicken	org.Gg.eg.db

1. Launcing ClusterProfiler-VICE App

First log-on CyVerse DE, open the Apps window and search for ‘clusterProfiler’.

Login to CyVerse DE
Open the Apps window and search for 'clusterProfiler' Rstudio-VICE app

2. Launch analysis

Launch the app by selecting an example folder and then clicking launch analysis. You can select different input files and/or folder and "Launch App"

Tip: You can use input files to import a script into the app.

Note: You will not see any files when selecting the folder if you selected input by folder. Rest assured that they will be there once the app begins to run.

3. Navigate to clusterProfiler app url

After the analysis starts running, open your notifications and click on the ‘Access your running Analysis here url’.
In the new URL, enter ‘rstudio’ for both username and password

5. Write/Run your code

In the Rstudio script section, you can write your code, generate plots, save plots etc. You can find an example script below.

6. Complete your analysis

Complete your analysis by clicking the Analysis window, then select the clusterProfiler analysis and click ‘Complete and Save Outputs’ option under “Analyses” button. After you had done this, you can find the outputs that you generated (if any) using the same steps as before, but this time selecting ‘Go To Output Folder’.

Warning: Currently, VICE can run for 48 hrs beyond which the apps will be terminated. So make sure you run your analysis before 48 hrs.

Example Analysis Script:

Section 1 - Gene Annotation

1. Load required libraries

            library("AnnotationDbi")
            library(clusterProfiler)
            library("org.Xx.eg.db") (Load required species database)

2. Set annotation database

            OrgDb <- org.Xx.xx.db

3. Read in your Gene list from file

            res <- read.table("GeneList.txt", header=TRUE)

4. Assign GeneId as row names

            row.names(res) <- res$Id

5. bitr: Biological Id TranslatoR

            gene.df <- bitr(res$Id, fromType = "ENSEMBL",

                                    toType = c("ENTREZID", "SYMBOL"),

                                    OrgDb = OrgDb)

6. Create gene list

            gene <- gene.df$ENTREZID

Section 2 - GO & KEGG Analysis

1. GO classification - gene classification based on GO distribution at a specific level.

            ggo <- groupGO(gene     = gene,

                           OrgDb    = OrgDb,

                           ont      = "BP",

                           level    = 3,

                           readable = TRUE)

2. GO classification plot

           barplot(ggo, drop=TRUE, showCategory=12)

3. GO over-representation test

           ego <- enrichGO(gene          = gene,

                           OrgDb         = OrgDb,

                           ont           = "BP",

                           pAdjustMethod = "BH",

                           pvalueCutoff  = 0.05,

                           qvalueCutoff  = 0.05,

                           readable      = TRUE)

4. Barplot visualization of GO enriched genes

           barplot(ego, showCategory=25)

5. Dotplot visualization of GO enriched genes

           dotplot(ego, showCategory=25)

6. Complex association plot

           cnetplot(ego, foldChange=gene)

7. Requires another package topGO

           plotGOgraph(ego)

8. KEGG over-representation test

           kk <- enrichKEGG(gene = geneDown,

                            organism = 'hsa',

                            pvalueCutoff = 0.05)


           dotplot(kk, showCategory=15)

Related Resources

ClusterProfiler Documentation: https://guangchuangyu.github.io/software/clusterProfiler/