RMTA v2.6.3

RMTA v2.6.3

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org.

Rationale and background:

  • RMTA is a high throughput RNA-seq read mapping and transcript assembly workflow. RMTA incorporates the standard RNA-seq analysis programs traditionally used one at a time into a single, easy to use workflow that can rapidly assemble and process any amount of local (FASTq) or NCBI-stored RNA-seq (SRA) data.

  • RMTA maps reads to user-provided reference genome using either HISAT2 (transcript analysis) or Bowtie2 (SNP analysis), assembles transcripts using StringTie, and then performs read quantification using FeatureCounts.

  • RMTA also supports for read alignment directly to a transcriptome using the quasi-aligner and transcript abundance quantifier Salmon (Rob et al., 2017; Srivastava et al., 2019). Salmon maps reads to the provided transcript assembly and then counts the number of reads associated with each transcript, generating an output file (quant.sf) that can immediately be used for differential expression. Note: The utilization of Salmon is only appropriate when the user is wanting to rapidly test for differential expression and cannot facilitate the identification of novel genes or data visualization in a genome browser. 

  • Beyond read mapping and assembly, RMTA has a number of additional features that automate onerous data transformation and quality control steps, thus producing outputs that can be directly used for differential expression analysis, data visualization, or novel gene identification - data analyses that can all be performed in the DE or at CoGe.

Pre-Requisites:

  1. A CyVerse account (Register for a free CyVerse account at https://user.cyverse.org). 

  2. An up-to-date Java-enabled web browser.

Genome-guided mapping:

Input data requirements:

  1. Reference Genome (FASTA) or HISAT2 Indexed Reference Genome (in a subdirectory)

  2. Reference Transcriptome (GFF3/GTF/GFF)

  3. RNA-Seq reads (FASTQ) - Single end or Paired-end (compressed or uncompressed) or multiple NCBI SRA id's (each SRA ID on a separate row in the text file).

  1. Mandatory fields

    1. Analysis Name

      1. Choose an appropriate name for your analysis and make comments if you wish. Default name is shown in the figure below.

      2. Select the output folder for the results of the analysis.

    2. Genome guided mapping

      1. Custom genome (required)

      2. HISAT2 Indexed folder (for indexed genomes)


    3. Select an aligner

      i) Hisat2
      ii) Bowtie2

    4. Reference annotation

    5. Feature Count Options

      1. Choose a Feature Type. The default option will be "exon"

      2. Choose a Gene Attribute. The default option will be "gene_id"

      3. Select the Type of Strandedness. The three options include unstranded, stranded, and reversely stranded.

      4. Please refer to your Genome Annotation File (.gtf), and confirm that these settings match your data. For Gene Attribute, be sure that gene_id is written before the name of each gene.

    6. Input reads

      Paired-end reads

      1. FASTQ Files (Read 1): HT path list of read 1 files of paired-end data

      2. FASTQ Files (Read 2): HT path list of read 2 files of paired-end data

       

       

       Single-end reads

                i. Single end FASTQ files or a HT path list of read files of single-end data
                                      

         SRA

                i. Enter the SRA id, or

               ii. Select a file containing a list of SRA ids (one per line) or a HT path list of multiple SRA ids list files
                                       

       

      If you have many files to process through the Discovery Environment, an HT Analysis Path List File may prove useful, as this app takes only 1 file at a time. For information on how to create an HT path list, click here

       

    7. Parameters

      1. Type of Sequence: Choose either Single End or Paired End

      2. Choose RNA strandedness (default is unstranded)  

      3. Number of threads (Default is 4)

      4. Run FastQC

       

    8. Advanced options:    



    9. RMTA_Output:

      Name of the output folder (Default is RMTA_Output)