Rascaf V1.0.2 in Discovery Environment

Rationale and background:

RASCAF

Rascaf: A tool for genome assembly improvement with RNA-seq data.

Song L, Shankar D, Florea L (2016). The Plant Genome 2016 9(3) (To appear), doi: 10.3835/plantgenome2016.03.0027 (Advance access). 

Rascaf (RnA-seq SCAFfolder) is a fast and efficient tool that leverages the long-range continuity information from intron-spanning RNA-seq read pairs to detect new contig connections and improve the assembly, in particularly in the gene regions.


Pre-Requisites:

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Input
     a. Mandatory
         -b bamfile(s):  one or multiple BAM file
         -f path_to_assembly: path to the raw assembly FASTA file
     b. Optional
         -ms minreads: minimum support for connecting two contigs (default: 2)
         -ml minlen:  minimum exonic length if no intron (default: 200)
         -k wordsize: the size of a k-mer (<=32; default: 21)
  3. Output
        -o filename:  prefix for the output file name

Test/Sample Data

The test data for Rascaf can be found here: /iplant/home/shared/iplantcollaborative/example_data/rascaf

Test Run

  1. Open Rascaf in DE
  2. Choose the sample input file from /iplant/home/shared/iplantcollaborative/example_data/rascaf/sample.bam
  3. Select the FASTA genome file from /iplant/home/shared/iplantcollaborative/example_data/rascaf/sample.fa
  4. Optionally set the other arguments, such as -ms, -ml, -k
  5. Optionally set the prefix for output file name
  6. Launch the analysis

Test Results: 

There are at least three output files; if multiple BAM input files are provided, the software will generate one file with extension '.out' for each input file. The following is the output when running Rascaf with the data provided and default parameters:

  1. rascal_scaffold.info
  2. rascaf_scaffold.fa
  3. sample_0.out