Before Beginning, Please Create an iPlant a CyVerse Account
One of the unique features of fRNAkenseq is that it is well integrated with the computational infrastructure of the iPlant CollaborativeCyVerse. This allows communication and data sharing between fRNAkenseq and other powered-by-iPlant CyVerse sequence analysis applications. Authentication into fRNAkenseq is authentication with iPlantCyVerse. Once you have created an iPlant a CyVerse account you will be able to use fRNAkenseq, in addition to other Powered-by-iPlant CyVerse tools such as the syntenic analysis tool, CoGE. This data cross-talk is useful as it provides a single entry point to multiple types of analysis. Because of the way fRNAkenseq is configured, once you create an iPlant a CyVerse account you will need to log into CoGE before fRNAkenseq.
iPlant CyVerse Account Creation:
https://user.iplantcollaborativecyverse.org/register/
CoGE Link:
https://genomevolution.org/coge/
INTRODUCTION TO RNA SEQ AND INFORMATICS ANALYSIS?
Transcriptome bioinformatics begins with the sequencing of a cDNA library, the results of which are typically received back from a sequencing center in the form of a FASTQ file. During library preparation, mRNA is reverse transcribed to DNA. Once sequenced, each read of cDNA can be aligned to a reference genome to determine the gene from which it was most likely transcribed. Because the number of cDNA molecules associated with a given gene should be proportionate to the number of transcribed RNA molecules, this allows a reasonable measure of the relative level of expression for a gene. A typical FASTQ file contains anywhere from 10 – 100 million reads. Enrichment analysis can be thought of as consisting of three conceptual steps: (1) mapping each read to the gene it was transcribed from, (2) quantifying the number of reads aligning to a given gene to calculate the relative expression level of that gene and (3) determining if, across experimental conditions, there is a significant change in the relative expression for any gene(s). Accurate quantification, however, is complicated and requires normalization for library size in addition to specialized statistics that determine if a given gene is enriched in a certain condition. fRNAkenseq is designed to help researchers make sense of high dimensional transcriptome data and determine which genes are most likely to be truly enriched. It does this by providing enrichment predictions from several popular algorithms. The researcher can then choose candidates for validation from the intersection of the different gene lists predicted to be enriched by each algorithm. The mapping component of transcriptome analysis is conducted by the MapCount section of fRNAkenseq. The enrichment prediction step is completed in the DiffExpress portion of fRNAkenseq.
DESCRIPTION OF WORKFLOW AND ALGORITHMS
Utilizing fRNAkenseq, affectionately abbreviated fRNAk, is simple. To complete the first two steps of RNA seq analysis- mapping and transcript quantification - simply navigate from the main page to MapCount. Select the libraries for which you want to quantify gene expression. Choose the genome representing the organism your samples are from. This genome will be pulled from the databank of over 20,000 fasta and annotation pairs available to fRNAk. These genomes will be processed by fRNAk using BowTie2 in order to enable use of the TopHat2 mapping algorithm which requires index FASTA files (Langmead et al., 2012). Also, choose the number of processors to devote to the mapping algorithms in order to parallelize their operations.
...