RseqFilt-1.0
Rationale and background
RseqFilt is an automated sequence filtering analysis tool for a single and paired-end high throughput RNA-seq data generated from Illumina sequencing platforms.
Features
Simultaneously filter and/or trim reads for adapter or primer contamination, uncalled bases (N), GC content, and low-quality reads
Supports single and paired-end reads
Analyze multiple samples simultaneously
Parallel computation for accelerating the speed of analysis
Visualization and statistics
No dependency on an external open-source tool
Prerequisites
A CyVerse account. (Register for a CyVerse account at https://user.cyverse.org/.)
An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)
Inputs
Single end input files or left files for paired-end data (Mandatory): Single end input files or left files for paired-end data (.fastq, .fq)
Right files for paired-end data (Optional): Right files for paired-end data (.fastq, .fq)
Parameters (Optional)
Quality Value Format: Quality value format [1= Illumina 1.8, 2= Illumina 1.3,3= Sanger]. If quality format not provided, it will automatically detect based on sequence data
Filter the reads containing given % of uncalled bases (N)
Trim the adapter and truncate the read sequence
Filter the reads which are lesser than the minimum size
Truncate the read sequence if it matches to adapter sequence equal or more than a given percent (0.0-1.0) [default=0.9]
Filter the read sequence if the average quality of bases in reads is lower than the threshold (1-40) [default:20]
The reads with low quality will be trimmed instead of discarding. The default is Unchecked box
The window size for trimming (5->3) reads. This option should always set when -trim option is defined [default: 5]
Minimum length of the reads to retain after trimming
Number of CPUs [default is 4]
Outputs
Output file format (fastq/fasta) [default:fastq]
No figures will be produced. The default is False (No figures will be produced). Check this box to generate figures
Test Data
The test data for RseqFilt is located in here - /iplant/home/shared/iplantcollaborative/example_data/RseqFilt
Inputs
Single end input files or left files for paired-end data (Mandatory): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R1_001.fastq
Right files for paired-end data (Optional): /iplant/home/shared/iplantcollaborative/example_data/RseqFilt/sample_R2_001.fastq
Parameters (Optional)
Leave all the parameters as default
Outputs
Leave the outputs as defaults
Successful completion of analysis results in two folder - `logs` and `sample_R1_001_filtering_out`. The `sample_R1_001_filtering_out` consists of the following files
Command.log
Statistics.txt
sample_R1_001_Basedist.png
sample_R1_001_Clean.fastq
sample_R1_001_GCdist.png
sample_R1_001_QualGroup.png
sample_R1_001_Qualdist.png
sample_R2_001_Basedist.png
sample_R2_001_Clean.fastq
sample_R2_001_GCdist.png
sample_R2_001_QualGroup.png
sample_R2_001_Qualdist.png