RMTA v1.5

Alert:

The CyVerse App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01').

Also, as part of the 2.8 app categorization, a number of apps were deprecated and are no longer available, and there is no longer an Archive category. You can search for a suitable replacement in the List of Applications in this window, or search on an app name or tool used for an app in the Apps window search field. If you need an app reinstated, please contact support@cyverse.org.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.

Rationale and background:

HISAT2: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

Mihaela Pertea,Daehwan Kim, Geo M Pertea, Jeffrey T Leek, Steven L Salzberg

Nature Protocols 11,1650–1667(2016)doi:10.1038/nprot.2016.095

HISAT, StringTie and Ballgown provide a complete analysis package (the 'new Tuxedo' package. RNA-seq analysis begins by mapping reads against a reference genome to identify their genomic positions. This mapping information allows us to collect subsets of the reads corresponding to each gene, and then to assemble and quantify transcripts represented by those reads. Hisat2 is another efficient splice aligner which is a replacement for Tophat in the new Tuxedo protocol. Like Tophat2 it uses one global FM index along with several small local FM indexes to build an efficient data structure which helps speed its alignment several times faster than Tophat2. If reference annotation is provided Hisat2 can extract with in built python script extract_splice_sites.py & extract_exons.py the splice site and exon information respectively. The wrapper script then takes the built index and and does alignment of reads against the reference.

HISAT2 software (http://ccb.jhu.edu/software/hisat2 or http://github.com/infphilo/hisat2, version 2.0.1 or later)

Pre-Requisites

A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Mandatory arguments
1. Output folder name
2. Input file reference genome sequence in fasta format
3. FASTQ Files (Read 1) : Input reads 1 files of paired end data or reads of single end data
4. FASTQ Files (Read 2) : Input reads 2 files of paired end data or leave this field empty for single end data
5. Fragment Library Type: specify the format of the library- more details(http://sailfish.readthedocs.io/en/master/library_type.html)
6. File type: Enter whether the library is paired end or single end
Optional arguments:
1. Trim bases from 5' end of read:Trim bases from 5' (left) end of each read before alignment
2. Trim bases from 3' end of read: Trim bases from 3' (right) end of each read before alignment
3. Phred quality score: encoding for quality score
4. Minimum intron length:Sets minimum intron length
5. maximum intron length:Sets maximum intron length
6. Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
7. Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
8. minimum fragment length for valid paired-end alignments:The minimum fragment length for valid paired-end alignments.
9. maximum fragment length for valid paired-end alignments:

Test/sample data

The following test data are provided for testing HISAT2 in here - /iplant/home/shared/iplantcollaborative/example_data/tophat2-PE( We will use a similar data as used for tophat2-PE):

left_reads- SRR946914_fastq_1.fastq,SRR946916_fastq_1.fastq
right_reads-SRR946914_fastq_2.fastq, SRR946916_fastq_2.fastq
reference-NC_010473.fa

Results

Successful execution of the HISAT2-index-align assessment pipeline will create a directory named out. The directory will contain bam and bai files for each sample. This can be used for further downstream analysis and visualization purpose:

output

SRR946914_fastq_1.sorted.bam

SRR946914_fastq_1.sorted.bam.bai

SRR946916_fastq_1.sorted.bam

SRR946916_fastq_1.sorted.bam.bai