Sickle-quality-based-trimming version 1.0

Sickle-quality-based-trimming version 1.0

An application for trimming FASTQ files based on quality values. Sickle performs quality trimming of reads and works with paired or single reads. This app can trim both paired-end and single-end reads:

For paired reads, both paired read files must be provided, as well as the two corresponding output file names, plus a file name for reads that become orphaned after trimming.

For single reads only, the first input sequence file and the first output file are required.

 

The application can be found in the Discovery Environment here.

App Creator

Blake Joyce

Quick Start

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> sickle
Full path to example data: /iplant/home/shared/iplantcollaborative/example_data/Sickle

The test data folder contains Sanger sequencing input data and the resulting Sickle output from running the app successfully.

Input File(s) are input in the 'Settings' tab

Use reads_1.fastq and reads_2.fastq from the directory above as test input.

Parameters Used in the 'Options' tab

Required

Under the 'Quality format' field it is essential to provide the correct quality type (sanger, illumina, or solexa) or Sickle will fail. The default selected is 'Illumina'.

* The example dataset provided at "/iplant/home/shared/iplantcollaborative/example_data/Sickle" is Sanger sequencing. Please choose 'Sanger' in the dropdown if you are using the example dataset.

Optional

A quality threshold can be designated (default=20), as well as the minimum read length required after trimming to keep a read.

* For the example dataset, you can use 20 for quality threshold and leave the minimum read length blank.

Reads with any N's can be discarded and 5-prime trimming can be turned off.

* For the example dataset, you can leave these turned on.

Output File(s)

The output files are Output file 1, Output file 2, and Single Read Output.

Output file 1 (default name: Output1) will contain trimmed forward reads (for paired-end data) or trimmed single reads for single-end data.

Output file 2 (no default name) will contain trimmed reverse reads for paired-end data. 

Single Read Output (no default name) will contain trimmed single reads for which the paired-reads were discarded because they did not meet the specified minimum length. This output is only generated for paired-end data.

When run on the example data expect all three output files, however the Single Read Output will be empty because we have not specified a minimum read length.

Tool Source for App

https://github.com/najoshi/sickle

For more information about Scythe and pre-processing sequences, please visit the Pre-processing Sequencing Reads on the CyVerse Wiki in the Genome and Transcript Assembly space and Evaluate and Pre-Process Sequencing Reads (Workflow Tutorial).