...
Pre-Requisites
- A CyVerse account. (Register for an a CyVerse account here - user.cyverse.org)
- Mandatory arguments
- Output folder name Input file reference genome sequence in fasta format
- Hisat2 reference genome: Select at least one of the below three options for the indexing of the Reference Genome
- Custom Reference genome
- Select reference genome from the list
- Hisat2 Indexed folder
- Hisat2 reference annotation: Select at least one of the below two options for using as annotation
- Custom Reference annotation
- Select reference annotation from the list
- Paired-end reads
- FASTQ Files (Read 1): Input reads 1
- file of paired
- -end data
- FASTQ Files (Read 2): Input reads 2 files of paired-end data
- Fragment Library Type: specify the format of the library- more details(http://sailfish.readthedocs.io/en/master/library_type.html)
- File type: Enter whether the library is paired end or single end
- Single-end reads
- single end FASTQ files
- SRA
- SRA ID: Single SRA id that you want to use
- File containing SRA id's: Multiple SRA's that you want to use
- Cufflinks/Stringtie: Only one of the below two options needs to be checked. Cannot select both
- StringTie
- Cufflinks
- Coverage cut-off threshold: Select from 0-5
- FPKM cut-off threshold: FPKM cut-off you want to use to filter the transcripts
- Cuffmerge: Run Cuffmerge for Stringtie/Cufflinks gtfs (Only works with more than one sample files)
- Advanced options
- Phred quality score: encoding for quality score: Phread64 (Default is Phred 33)
- Fragment Library Type: specify the format of the library either FR, RF, F, R etc.
- Trim bases from 5' end of read: Trim bases from 5' (left) end of each read before alignment
Trim bases from 3' end of read: Trim bases from 3' (right) end of each read before alignment
Phred quality score: encoding for quality score
- Minimum intron length: Sets Set minimum intron length
- maximum intron length: Sets maximum intron length
- Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
- Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
- minimum fragment length for valid paired-end alignments:The minimum fragment length for valid paired-end alignments.
- maximum fragment length for valid paired-end alignments:Set maximum intron length
The following test data are provided for testing HISAT2 RMTA in here - /iplant/home/shared/iplantcollaborative/example_data/tophat2-PE( We will use a similar data as used for tophat2-PE):RMTA
- Reference Genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa
- Reference Annotation: Sorghum_bicolor.Sorbi1.20_chr8.gtf
- left_reads- SRR946914 sample_fastq1_1.fastq,SRR946916_fastq_1.fastqR1.fq.gz
- right_reads-SRR946914sample_fastq1_2.fastq, SRR946916_fastq_2.fastqreference-NC_010473.fa R2.fq.gz
- Stringtie
- Fragment Library Type: FR
Leave the rest as default
Results
Successful execution of the HISAT2-index-align assessment pipeline will create a directory named out. The directory will contain bam and bai files for each sample. This can be used for further downstream analysis and visualization purpose:
output
SRR946914_fastq_1.sorted.bam
SRR946914_fastq_1.sorted.bam.bai
SRR946916_fastq_1.sorted.bam
SRR946916_fastq_1.sorted.bam.bai
RMTA will generate two output folders
- Index: This folder consists of the index of the genome
- Output: This folder consists of the output from Hisat2, Stringtie and Cuffcompare (Please refer to the manual for the explanation of outputs from these individual programs)