Rascaf V1.0.2 in Discovery Environment
Rationale and background:
RASCAF
Rascaf: A tool for genome assembly improvement with RNA-seq data.
Song L, Shankar D, Florea L (2016). The Plant Genome 2016 9(3) (To appear), doi: 10.3835/plantgenome2016.03.0027 (Advance access).
Rascaf (RnA-seq SCAFfolder) is a fast and efficient tool that leverages the long-range continuity information from intron-spanning RNA-seq read pairs to detect new contig connections and improve the assembly, in particularly in the gene regions.
Pre-Requisites:
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Input
a. Mandatory
-b bamfile(s): one or multiple BAM file
-f path_to_assembly: path to the raw assembly FASTA file
b. Optional
-ms minreads: minimum support for connecting two contigs (default: 2)
-ml minlen: minimum exonic length if no intron (default: 200)
-k wordsize: the size of a k-mer (<=32; default: 21) - Output
-o filename: prefix for the output file name
Test/Sample Data
The test data for Rascaf can be found here: /iplant/home/shared/iplantcollaborative/example_data/rascaf
Test Run
- Open Rascaf in DE
- Choose the sample input file from /iplant/home/shared/iplantcollaborative/example_data/rascaf/sample.bam
- Select the FASTA genome file from /iplant/home/shared/iplantcollaborative/example_data/rascaf/sample.fa
- Optionally set the other arguments, such as -ms, -ml, -k
- Optionally set the prefix for output file name
- Launch the analysis
Test Results:
There are at least three output files; if multiple BAM input files are provided, the software will generate one file with extension '.out' for each input file. The following is the output when running Rascaf with the data provided and default parameters: