RNA-Seq studies with the almond tree -- an example.

Recently an RNA-Seq study was published in Plos One for the almond tree (Prunus dulcis) -- De Novo Transcriptome Assembly and Comparative Analysis of Differentially Expressed Genes in Prunus dulcis Mill. in Response to Freezing Stress by Sadegh Mousavi et al. (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0104541). This sort of study requires a different approach than the more commonly used Tuxedo pipline (Tophat/Cufflinks/CuffDiff, etc.), because apparently there is no good reference genome available for this species. This is a situation that faces many researchers. To tackle this problem, iPlant's Discovery Environment platform (https://de.iplantcollaborative.org/de/Discoveryenvironment.jsp#workspace) includes all or almost all the tools needed for RNA-Seq studies of a species without a reference genome. A rough tutorial describes step-by-step how to do this sort of analysis (Tutorial: Characterizing Differential Expression With RNA-Seq (Without Reference Genome)), but we can go through some of the basic steps of doing the analysis just as it was done in the study of the almond tree.

These are the basic steps performed by the authors for their study of the almond tree transcriptome:

  1. FastQC Toolkit
  2. Fastx Toolkit
  3. Trinity
  4. BLASTx
  5. RSEM Package
  6. EBseq Package
  7. AgriGo (web tool)
  8. IDEG6  (web tool)

FastQC and Fastx toolkit are available in the iPlant Discovery Environment (DE), but I would recommend a more efficient set of tools. A relatively new set tools in the DE include the HTProcess series of apps, that treat whole libraries of reads at once. Learn more about this read cleanup workflow here: Cleaning up your reads with the HTProcess Pipeline.

The Trinity transcriptome assembler is a commonly used assembler and is available in the DE as an app Using Trinity in the Discovery Environment. Because of the computational and memory demands of this application, jobs submitted to Trinity in the DE are sent to run on an XSEDE system large server (all seemlessly behind the scenes) to provide the necessary resources for this demanding application. When the results are ready, they are returned to the user's iPlant Data Store. To improve the effectiveness of the assembly, I would also recommend running Trinity normalize_by_kmer_coverage, which is a Trinity-related tool for preparing the reads prior to assembly. In the DE, there are other transcriptome assembly tools the user can try as well, if they have problems with Trinity, which can be tempermental. The other transcriptome tools include Soap-denovo-Trans, and Oases. 

Once the assembly is complete, there are tools in the DE for cleaning up the assembly and evaluating it. There is also a tutorial that describes some of the options for transcriptome assembly, evaluation, and cleanup in the DE ( http://www.iplantcollaborative.org/learning-center/discovery-environment/de-005-transcriptome-assembly-de-novo). Included in this tutorial is a step to run Transcript decoder, which finds the good coding sequences among the transcript contigs and translates them. This enables the user to use BLASTp or Delta-BLAST, for the annotation step, rather than BLASTx, which is very slow, and requires an enormous amount of compute time. 

Both BLASTp and Delta-BLAST are available in the DE. Delta-BLAST, in our tests, has been very sensitive and uses an interesting algorthm for finding conserved peptide domains within a sequence, to aid in sequence mapping. We also suggest using a good database for post-BLAST GO annotation, such as UniProt. UniprotKB provides a great resource for selecting a subset of the UniProt database, so that the user can run the BLAST step faster and target it to the sequences that are most relevant (http://www.uniprot.org/uniprot/). The user can download any database and upload it into the DE. There is a tool in the DE to make a BLAST database out of a fasta database. In the near future, there will be even more tools and databases in the DE and its Data Store. A tutorial on transcriptome annotation within the DE is available here: BLAST a Transcriptome. A tool that makes Blasting the Uniprot database easy is "Blastp a subset of uniprot". A related tool provides Gene Ontology info for many of the Blastp matches in Uniprot: Add GO to Blastp-uniprot output.

Part of the transcript quality evaluation used in the almond tree study was a step for mapping the reads directly to the transcript contigs to determine the percentage of reads that actually map to the assembly. At this stage, it is good to use the Bowtie Build-and-Map app  in the DE (for use with RSEM, NOT Bowtie2 Build-and-Map). To find the information on mapping efficiency, look at the condor-stderr-0 file in the logs directory of the output. It is a simple text file that contains the mapping stats. The mapping can also be used, after running the app Samtools-0.1.19 SAM-to-sorted-BAM (with the box checked for name-sorting) to run RSEM, which is also available in the DE. RSEM pulls out the FPKM values for the various transcript contigs. Then, following the methods described in the tutorial for RNA-Seq without a reference genome (or the the Universal Method), construct a matrix table to be used with EdgeR. All the tools are available in the DE. [ A hint: when looking for a tool in the DE, just type its name in the search box at the top of the apps window. ] 

EBSeq is not yet available in the DE, but differential expression analysis can be done using the EdgeR app in the DE. At this point the user will have the information about statistically significant differences in gene or isoform expression for the samples studied. The AgriGo and the IDEG6 tools are webtools, and since the scale of the analysis is small enough, webtools can be quite useful for this stage of analysis. But the DE also has other annotation and gene ontology tools that the user might find interesting, including GOSeq, GeneMania Query Runner, MetaGeneAnnotator, Gfold 1.1.1 Difference Expression, Ontologizer, and PAGE.

These are the steps supported by iPlant's Discovery Environment that can be used to reproduce the almond tree studies:

  1. HTProcess-Prepare_Directories-and-Run_FastQC-0.1
  2. HTProcess_Trimmomatic_0.32
  3. Trinity normalize_by_kmer_coverage
  4. Trinity R2013-8-14
  5. Transcript decoder 1.0
  6. Blastp a subset of uniprot
  7. Add GO to Blastp-uniprot output
  8. Bowtie-Build-and-Map
  9. SAMTOOLS-0.1.19 SAM-to-sorted-BAM
  10. RSEM-1.2.12
  11. EdgeR
  12. AgriGo (web tool)
  13. IDEG6 (web tool) 

The iPlant Collaborative provides a large range of tools for analysis of RNA-Seq data, including those needed to largely duplicate the almond tree studies of Sadegh Mousavi et al., but iPlant resources support many other types of studies as well. Many resources are found largely within the Discovery Environment, but additonal studies can be done with the cloud computing resources provided with Atmosphere. For further questions or if problems arise, check first in the iPlant Collaborative's forum, ask.iplantcollaborative.org

 

Filter by label

There are no items with the selected labels at this time.

Unable to render {include} The included page could not be found.