For an introduction to using the DE, see Using the Discovery Environment.

Please work through the tutorial and add your comments on the bottom of this page, or use the intercom button at the bottom right of this page to post your question. Thank you.

Rationale and background

RseqFilt is an automated sequence filtering analysis tool for a single and paired-end high throughput RNA-seq data generated from Illumina sequencing platforms.

Features

Simultaneously filter and/or trim reads for adapter or primer contamination, uncalled bases (N), GC content, and low-quality reads
Supports single and paired-end reads
Analyze multiple samples simultaneously
Parallel computation for accelerating the speed of analysis
Visualization and statistics
No dependency on an external open-source tool

Prerequisites

A CyVerse account. (Register for a CyVerse account at https://user.cyverse.org/.)
An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

Inputs

Single end input files or left files for paired-end data (Mandatory): Single end input files or left files for paired-end data (.fastq, .fq)
Right files for paired-end data (Optional): Right files for paired-end data (.fastq, .fq)

Parameters (Optional)

Quality Value Format: Quality value format [1= Illumina 1.8, 2= Illumina 1.3,3= Sanger]. If quality format not provided, it will automatically detect based on sequence data
Filter the reads containing given % of uncalled bases (N)
Trim the adapter and truncate the read sequence
Filter the reads which are lesser than the minimum size
Truncate the read sequence if it matches to adapter sequence equal or more than a given percent (0.0-1.0) [default=0.9]
Filter the read sequence if the average quality of bases in reads is lower than the threshold (1-40) [default:20]
The reads with low quality will be trimmed instead of discarding. The default is Unchecked box
The window size for trimming (5->3) reads. This option should always set when -trim option is defined [default: 5]
Minimum length of the reads to retain after trimming
Number of CPUs [default is 4]

Outputs

Output file format (fastq/fasta) [default:fastq]
No figures will be produced. The default is False (No figures will be produced). Check this box to generate figures

Test Data

The test data for RseqFilt is located in here - /iplant/home/shared/iplantcollaborative/example_data/RseqFilt

Inputs

Single end input files or left files for paired-end data (Mandatory): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R1_001.fastq
Right files for paired-end data (Optional): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R2_001.fastq

Parameters (Optional)

Leave all the parameters as default

Outputs

Leave the outputs as defaults

Successful completion of analysis results in two folder - `logs` and `sample_R1_001_filtering_out`. The `sample_R1_001_filtering_out` consists of the following files

Command.log
Statistics.txt
sample_R1_001_Basedist.png
sample_R1_001_Clean.fastq
sample_R1_001_GCdist.png
sample_R1_001_QualGroup.png
sample_R1_001_Qualdist.png
sample_R2_001_Basedist.png
sample_R2_001_Clean.fastq
sample_R2_001_GCdist.png
sample_R2_001_QualGroup.png
sample_R2_001_Qualdist.png

Discovery Environment Applications List

RseqFilt-1.0

Rationale and background

Features

Inputs

Parameters (Optional)

Outputs

Test Data

Inputs

Parameters (Optional)

Outputs