GBS
Genotyping by Sequencing Workflow
Rationale and background:
Genotyping by Sequencing (GBS) refers to an approach to obtain large amounts of variation data across the entire genome via High Throughput sequencing. GBS is a technique, not an analysis, meaning that a specific protocol needs to be followed in the lab in order to generate the proper data; it cannot be used on whole genome sequencing data. GBS takes advantage of sequencing a reduce representation of the genome, rather than the entire genome: by spicing the DNA at known loci through restriction enzymes, it ensures that the same regions are sequenced across all individuals.
The technique was developed in Ed Buckler's lab at Cornell. See here for details.
The GBS workflow is comprises of multiple steps, but they have been merged info a single app in the discovery environment.
Specific Objectives
By the end of this module, you should
- Be more familiar with the DE user interface
- Able to obtain variation data from GBS sequence data
The Staged Fastq Data can be found in the
'Community Data'->iplant_training->gbs_workflow->input
Step 1: Run GBS Workflow
The GBS Workflow App is built on top of the software package TASSEL and the aligner BWA
Diagram of the GBS pipeline
The "Genotyping by Sequencing Workflow" App is located under
Public Apps->QTL and GWAS->TASSEL
The first step is to transform the raw sequence reads into tags.
Besides the sequence file in FASTQ or QSEQ format, a "key file" is needed. The key file contains the mapping between samples, barcodes, flowcell and lane information.
Example key file
In the gbsTags - Input Data section, you can click 'Browse' at the top right of the Input file folder box to navigate to the folder containing the FASTQ or QSEQ files:
'Community Data'->iplant_training->gbs_workflow->input->qseq
click the 'Browse' button next to the Select key file box and navigate to the example key file
'Community Data'->iplant_training->gbs_workflow->input->61VBPAAXX_key.txt
In the section gbsTags - Options section choose "qseq" from the Select tag format list and select "ApekI" from the list of enzymes. This is the enzyme that has been used during the sequencing library construction to digest the input DNA. Leave the Minimum Unique Reads to 5. This is the number of times a tag has to occur in order to be considered. In order for the analysis to complete faster, change the End chromosome to "1".
Filled gbsTags -Options section
The tags need to be aligned to a reference genome. Click on the gbsBWA - Input section and select "Oryza_sativa (japonica)" from the list of reference genomes. Leave the gbsBWA - Options as they are.
Change the End chromosome to "1" in the gbsTagsToSNP - Options, gbsMergeSNP - Options and gbsFilter - Options.
Click the "Launch Analysis" button to submit your analysis.