Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added example script

...

In recent years, high-throughput experimental techniques such as microarray, RNA-Seq and mass spectrometry can detect cellular molecules at systems-level. These kinds of analyses generate huge quantities of data, which need to be given a biological interpretation. A commonly used approach is via clustering in the gene dimension for grouping different genes based on their similarities(Yu et al. 2010).

To search for shared functions among genes, a common way is to incorporate the biological knowledge, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), for identifying predominant biological themes of a collection of genes. The clusterProfiler package implements methods to analyze and visualize functional profiles of genomic coordinates (supported by ChIPseeker), gene and gene clusters.

After clustering analysis, researchers not only want to determine whether there is a common theme of a particular gene cluster, but also to compare the biological themes among gene clusters. The manual step to choose interesting clusters followed by enrichment analysis on each selected cluster is slow and tedious. To bridge this gap, we designed clusterProfiler(Yu et al. 2012), for comparing and visualizing functional profiles among gene clusters.

 

Pre-Requisites:

+ You will need a CyVerse account to complete this exercise(Register here: https://user.cyverse.org).

+ Discovery Environment access - You will get access to Discovery Environment by default. Check the ‘My Services’ tab; if Discovery Environment is not listed, click the ‘Available’ tab, locate Discovery Environment and click the link to request access.

An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

Supported Analyses

  • Over-Representation Analysis
  • Gene Set Enrichment Analysis
  • Biological theme comparison

Species Supported:

NameDatabase
Humanorg.Hs.eg.db
Mouseorg.Mm.eg.db
Rhesusorg.Mmu.eg.db
Chimporg.Pt.eg.db
Ratorg.Rn.eg.db
Yeastorg.Sc.sgd.db
Pigorg.Ss.eg.db
Xenopusorg.Xl.eg.db
Anophelesorg.Ag.eg.db
Arabidopsisorg.At.tair.db
Bovineorg.Bt.eg.db
Wormorg.Ce.eg.db
Canineorg.Cf.eg.db
Flyorg.Dm.eg.db
Zebrafishorg.Dr.eg.db
E coli strain K12org.EcK12.eg.db
E coli strain Sakaiorg.EcSakai.eg.db
Chickenorg.Gg.eg.db

1. Launcing

Rstudio

ClusterProfiler-VICE App

First log-on CyVerse DE, open the Apps window and search for ‘clusterProfiler’.

  1. Login to CyVerse DE
  2. Open the Apps window and search for 'clusterProfiler' Rstudio-VICE app

2. Launch analysis

Launch the app by selecting an example folder and then clicking launch analysis. You can select different input files and/or folder and "Launch App"

 

Tip: You can use input files to import a script into the app.

 

Note: You will not see any files when selecting the folder if you selected input by folder. Rest assured that they will be there once the app begins to run.

3. Navigate to clusterProfiler app url

  • After the analysis starts running, open your notifications and click on the ‘Access your running Analysis here url’.
  • In the new URL, enter ‘rstudio’ for both username and password

5. Write/Run your code

In the Rstudio script section, you can write your code, generate plots, save plots etc. You can find an example script below.

6. Complete your analysis

Complete your analysis by clicking the Analysis window, then select the clusterProfiler analysis and click ‘Complete and Save Outputs’ option under “Analyses” button. After you had done this, you can find the outputs that you generated (if any) using the same steps as before, but this time selecting ‘Go To Output Folder’.

 

Warning: Currently, VICE can run for 48 hrs beyond which the apps will be terminated. So make sure you run your analysis before 48 hrs.

Example Analysis Script:

Section 1 - Gene Annotation

 1. Load required libraries
            library("AnnotationDbi")
library(clusterProfiler)
library("org.Xx.eg.db") (Load required species database)

 

 2. Set annotation database

...

 

            OrgDb <- org.Xx.xx.db
      3. Read in your Gene list from file
                     res <- read.table("GeneList.txt", header=TRUE)
      4. Assign GeneId as row names

...

            row.names(res) <- res$Id 

...

5. bitr: Biological Id TranslatoR

...

            gene.df <- bitr(res$Id, fromType = "ENSEMBL",

...

                                    toType = c("ENTREZID", "SYMBOL"),

...

                                    OrgDb = OrgDb)

...

6. Create gene list

...

            gene <- gene.df$ENTREZID

Section 2 - GO & KEGG Analysis

                 

...

1. GO classification - gene classification based on GO distribution at a specific level.

...

            ggo <- groupGO(gene     = gene,

...

                           OrgDb    = OrgDb,

...

                           ont      = "BP",

...

                           level    = 3,

                                             readable = TRUE)

 

 

 

 

 

 

 

...

                           readable = TRUE)
2. GO classification plot
           barplot(ggo, drop=TRUE, showCategory=12)
3. GO over-representation test
           ego <- enrichGO(gene          = gene,
                           OrgDb         = OrgDb,
                           ont           = "BP",
                           pAdjustMethod = "BH",
                           pvalueCutoff  = 0.05,
                           qvalueCutoff  = 0.05,
                           readable      = TRUE)
4. Barplot visualization of GO enriched genes
           barplot(ego, showCategory=25)
5. Dotplot visualization of GO enriched genes
           dotplot(ego, showCategory=25)
6. Complex association plot
           cnetplot(ego, foldChange=gene)
7. Requires another package topGO
           plotGOgraph(ego)
8. KEGG over-representation test
           kk <- enrichKEGG(gene = geneDown,
                            organism = 'hsa',
                            pvalueCutoff = 0.05)

           dotplot(kk, showCategory=15)

 

Related Resources