RMTA v2.6.3
The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org.
Rationale and background:
RMTA is a high throughput RNA-seq read mapping and transcript assembly workflow. RMTA incorporates the standard RNA-seq analysis programs traditionally used one at a time into a single, easy to use workflow that can rapidly assemble and process any amount of local (FASTq) or NCBI-stored RNA-seq (SRA) data.
RMTA maps reads to user-provided reference genome using either HISAT2 (transcript analysis) or Bowtie2 (SNP analysis), assembles transcripts using StringTie, and then performs read quantification using FeatureCounts.
RMTA also supports for read alignment directly to a transcriptome using the quasi-aligner and transcript abundance quantifier Salmon (Rob et al., 2017; Srivastava et al., 2019). Salmon maps reads to the provided transcript assembly and then counts the number of reads associated with each transcript, generating an output file (quant.sf) that can immediately be used for differential expression. Note: The utilization of Salmon is only appropriate when the user is wanting to rapidly test for differential expression and cannot facilitate the identification of novel genes or data visualization in a genome browser.
Beyond read mapping and assembly, RMTA has a number of additional features that automate onerous data transformation and quality control steps, thus producing outputs that can be directly used for differential expression analysis, data visualization, or novel gene identification - data analyses that can all be performed in the DE or at CoGe.
Pre-Requisites:
A CyVerse account (Register for a free CyVerse account at https://user.cyverse.org).
An up-to-date Java-enabled web browser.
Genome-guided mapping:
Input data requirements:
Reference Genome (FASTA) or HISAT2 Indexed Reference Genome (in a subdirectory)
Reference Transcriptome (GFF3/GTF/GFF)
RNA-Seq reads (FASTQ) - Single end or Paired-end (compressed or uncompressed) or multiple NCBI SRA id's (each SRA ID on a separate row in the text file).
Mandatory fields
Analysis Name
Choose an appropriate name for your analysis and make comments if you wish. Default name is shown in the figure below.
Select the output folder for the results of the analysis.
Genome guided mapping
Custom genome (required)
HISAT2 Indexed folder (for indexed genomes)
Select an aligner
i) Hisat2
ii) Bowtie2Reference annotation
Feature Count Options
Choose a Feature Type. The default option will be "exon"
Choose a Gene Attribute. The default option will be "gene_id"
Select the Type of Strandedness. The three options include unstranded, stranded, and reversely stranded.
Please refer to your Genome Annotation File (.gtf), and confirm that these settings match your data. For Gene Attribute, be sure that gene_id is written before the name of each gene.
Input reads
Paired-end reads
FASTQ Files (Read 1): HT path list of read 1 files of paired-end data
FASTQ Files (Read 2): HT path list of read 2 files of paired-end data
Single-end reads
i. Single end FASTQ files or a HT path list of read files of single-end data
SRA
i. Enter the SRA id, or
ii. Select a file containing a list of SRA ids (one per line) or a HT path list of multiple SRA ids list files
If you have many files to process through the Discovery Environment, an HT Analysis Path List File may prove useful, as this app takes only 1 file at a time. For information on how to create an HT path list, click here
Parameters
Type of Sequence: Choose either Single End or Paired End
Choose RNA strandedness (default is unstranded)
Number of threads (Default is 4)
Run FastQC
Advanced options:
RMTA_Output:
Name of the output folder (Default is RMTA_Output)