RseqFilt-1.0
For an introduction to using the DE, see Using the Discovery Environment.
Please work through the tutorial and add your comments on the bottom of this page, or use the intercom button at the bottom right of this page to post your question. Thank you.
Rationale and background
RseqFilt is an automated sequence filtering analysis tool for a single and paired-end high throughput RNA-seq data generated from Illumina sequencing platforms.
Features
- Simultaneously filter and/or trim reads for adapter or primer contamination, uncalled bases (N), GC content, and low-quality reads
- Supports single and paired-end reads
- Analyze multiple samples simultaneously
- Parallel computation for accelerating the speed of analysis
- Visualization and statistics
- No dependency on an external open-source tool
Prerequisites
- A CyVerse account. (Register for a CyVerse account at https://user.cyverse.org/.)
- An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)
Inputs
- Single end input files or left files for paired-end data (Mandatory): Single end input files or left files for paired-end data (.fastq, .fq)
- Right files for paired-end data (Optional):Â Right files for paired-end data (.fastq, .fq)
Parameters (Optional)
- Quality Value Format: Quality value format [1= Illumina 1.8, 2= Illumina 1.3,3= Sanger]. If quality format not provided, it will automatically detect based on sequence data
- Filter the reads containing given % of uncalled bases (N)
- Trim the adapter and truncate the read sequence
- Filter the reads which are lesser than the minimum size
- Truncate the read sequence if it matches to adapter sequence equal or more than a given percent (0.0-1.0) [default=0.9]
- Filter the read sequence if the average quality of bases in reads is lower than the threshold (1-40)Â [default:20]
- The reads with low quality will be trimmed instead of discarding. The default is Unchecked box
- The window size for trimming (5->3) reads. This option should always set when -trim option is defined [default: 5]
- Minimum length of the reads to retain after trimming
- Number of CPUs [default is 4]
Outputs
- Output file format (fastq/fasta) [default:fastq]
- No figures will be produced. The default is False (No figures will be produced). Check this box to generate figures
Test Data
The test data for RseqFilt is located in here -Â /iplant/home/shared/iplantcollaborative/example_data/RseqFilt
Inputs
- Single end input files or left files for paired-end data (Mandatory):Â /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R1_001.fastq
- Right files for paired-end data (Optional):Â /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R2_001.fastq
Parameters (Optional)
Leave all the parameters as default
Outputs
Leave the outputs as defaults
Successful completion of analysis results in two folder - `logs` and `sample_R1_001_filtering_out`. The `sample_R1_001_filtering_out` consists of the following files
Command.log Statistics.txt sample_R1_001_Basedist.png sample_R1_001_Clean.fastq sample_R1_001_GCdist.png sample_R1_001_QualGroup.png sample_R1_001_Qualdist.png sample_R2_001_Basedist.png sample_R2_001_Clean.fastq sample_R2_001_GCdist.png sample_R2_001_QualGroup.png sample_R2_001_Qualdist.png