FastQC-0.11.5 (multi-file)

Alert:

 

The CyVerse App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01').

Also, as part of the 2.8 app categorization, a number of apps were deprecated and are no longer available, and there is no longer an Archive category. You can search for a suitable replacement in the List of Applications in this window, or search on an app name or tool used for an app in the Apps window search field. If you need an app reinstated, please contact support@cyverse.org.

Please work through the documentation and add your comments on the bottom of this page, or email comments to upendra@cyverse.org. Thank you.

 

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

The main functions of FastQC are

  • Import of data from BAM, SAM or FastQ files (any variant)
  • Providing a quick overview to tell you in which areas there may be problems
  • Summary graphs and tables to quickly assess your data
  • Export of results to an HTML based permanent report
  • Offline operation to allow automated generation of reports without running the interactive application

Note

NOTE: FastQC will determine the format that your FASTQ reads are in (PHRED33, Illumina, etc). The detected read type will be listed on the graphs outputted. As an additional note, PHRED33 is exactly the same as Solexa / Illumina 1.9, thus if using these FASTQ files in downstream apps such as the FASTX toolkit, you will need to select PHRED33 for your format type if your reads are in Solexa/Illumina 1.9 format.

Test Data

All files are located in the Community Data directory of the iPlant Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > fastqc

Input File(s)

Use SRR070572_hy5.fastq as test data.

Output File(s)

All outputs can be found in the directory Community Data > iplantcollaborative > example_data > fastqc

  • Expect the following as outputs (in addition to the logs generated for all analyses)
    • Directory with name of the input file used
    • zipped instance of this directory
  • Within the directory generated (in the case of the above example, it should read SRR070572_hy5_fastqc), there are two sub directories and several files.
    • Sub directories are icons (not scientifically necessary) and images.
    • Files generated in this directory are the following: fastqc_data.txt, fastqc_report.html and summary.txt
  • Within the image directory, the following files should be available:
    • duplication_levels.png
    • kmer_profiles.png
    • per_base_gc_content.png
    • per_base_n_content.png
    • per_base_quality.png
    • per_base_sequence_content.png
    • per_sequence_gc_content.png
    • per_sequence_quality.png
    • sequence_length_distribution.png

Tool Source for App