RseqFilt-1.0

For an introduction to using the DE, see Using the Discovery Environment.

Please work through the tutorial and add your comments on the bottom of this page, or use the intercom button at the bottom right of this page to post your question. Thank you.

Rationale and background

RseqFilt is an automated sequence filtering analysis tool for a single and paired-end high throughput RNA-seq data generated from Illumina sequencing platforms.

Features

  • Simultaneously filter and/or trim reads for adapter or primer contamination, uncalled bases (N), GC content, and low-quality reads
  • Supports single and paired-end reads
  • Analyze multiple samples simultaneously
  • Parallel computation for accelerating the speed of analysis
  • Visualization and statistics
  • No dependency on an external open-source tool

Prerequisites

  1. A CyVerse account. (Register for a CyVerse account at https://user.cyverse.org/.)
  2. An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

Inputs

  1. Single end input files or left files for paired-end data (Mandatory): Single end input files or left files for paired-end data (.fastq, .fq)
  2. Right files for paired-end data (Optional): Right files for paired-end data (.fastq, .fq)

Parameters (Optional)

  1. Quality Value Format: Quality value format [1= Illumina 1.8, 2= Illumina 1.3,3= Sanger]. If quality format not provided, it will automatically detect based on sequence data
  2. Filter the reads containing given % of uncalled bases (N)
  3. Trim the adapter and truncate the read sequence
  4. Filter the reads which are lesser than the minimum size
  5. Truncate the read sequence if it matches to adapter sequence equal or more than a given percent (0.0-1.0) [default=0.9]
  6. Filter the read sequence if the average quality of bases in reads is lower than the threshold (1-40) [default:20]
  7. The reads with low quality will be trimmed instead of discarding. The default is Unchecked box
  8. The window size for trimming (5->3) reads. This option  should always set when -trim option is defined [default: 5]
  9. Minimum length of the reads to retain after trimming
  10. Number of CPUs [default is 4]

Outputs

  1. Output file format (fastq/fasta) [default:fastq]
  2. No figures will be produced. The default is False (No figures will be produced). Check this box to generate figures

Test Data

The test data for RseqFilt is located in here - /iplant/home/shared/iplantcollaborative/example_data/RseqFilt

Inputs

  1. Single end input files or left files for paired-end data (Mandatory): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R1_001.fastq
  2. Right files for paired-end data (Optional): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R2_001.fastq

Parameters (Optional)

Leave all the parameters as default

Outputs

Leave the outputs as defaults

Successful completion of analysis results in two folder - `logs` and `sample_R1_001_filtering_out`. The `sample_R1_001_filtering_out` consists of the following files

Command.log
Statistics.txt
sample_R1_001_Basedist.png
sample_R1_001_Clean.fastq
sample_R1_001_GCdist.png
sample_R1_001_QualGroup.png
sample_R1_001_Qualdist.png
sample_R2_001_Basedist.png
sample_R2_001_Clean.fastq
sample_R2_001_GCdist.png
sample_R2_001_QualGroup.png
sample_R2_001_Qualdist.png