For an introduction to using the DE, see Using the Discovery Environment.

Please work through the tutorial and add your comments on the bottom of this page, or use the intercom button at the bottom right of this page to post your question. Thank you.

Rationale and background

RseqFilt is an automated sequence filtering analysis tool for a single and paired-end high throughput RNA-seq data generated from Illumina sequencing platforms.


  • Simultaneously filter and/or trim reads for adapter or primer contamination, uncalled bases (N), GC content, and low-quality reads
  • Supports single and paired-end reads
  • Analyze multiple samples simultaneously
  • Parallel computation for accelerating the speed of analysis
  • Visualization and statistics
  • No dependency on an external open-source tool


  1. A CyVerse account. (Register for a CyVerse account at
  2. An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)


  1. Single end input files or left files for paired-end data (Mandatory): Single end input files or left files for paired-end data (.fastq, .fq)
  2. Right files for paired-end data (Optional): Right files for paired-end data (.fastq, .fq)

Parameters (Optional)

  1. Quality Value Format: Quality value format [1= Illumina 1.8, 2= Illumina 1.3,3= Sanger]. If quality format not provided, it will automatically detect based on sequence data
  2. Filter the reads containing given % of uncalled bases (N)
  3. Trim the adapter and truncate the read sequence
  4. Filter the reads which are lesser than the minimum size
  5. Truncate the read sequence if it matches to adapter sequence equal or more than a given percent (0.0-1.0) [default=0.9]
  6. Filter the read sequence if the average quality of bases in reads is lower than the threshold (1-40) [default:20]
  7. The reads with low quality will be trimmed instead of discarding. The default is Unchecked box
  8. The window size for trimming (5->3) reads. This option  should always set when -trim option is defined [default: 5]
  9. Minimum length of the reads to retain after trimming
  10. Number of CPUs [default is 4]


  1. Output file format (fastq/fasta) [default:fastq]
  2. No figures will be produced. The default is False (No figures will be produced). Check this box to generate figures

Test Data

The test data for RseqFilt is located in here - /iplant/home/shared/iplantcollaborative/example_data/RseqFilt


  1. Single end input files or left files for paired-end data (Mandatory): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R1_001.fastq
  2. Right files for paired-end data (Optional): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R2_001.fastq

Parameters (Optional)

Leave all the parameters as default


Leave the outputs as defaults

Successful completion of analysis results in two folder - `logs` and `sample_R1_001_filtering_out`. The `sample_R1_001_filtering_out` consists of the following files