EMS mutant sites identification Workflow
EMS mutatnt sites identification Workflow (DE App: ems_mutation)
Conceptual overview
This workflow is to identify the EMS induced mutant sites by whole genome sequencing.
Use bowtie2 (bowtie, bwa) and samtools for variants calling of the parental inbred line, and F2 mutants population and individual mutants
Filter SNPs by discarding background SNPs and read depth and allele frequency
All steps are bundled into single DE App, ems_mutation
Due to computation time constraint, a large data set might not be able to finished in time and cause the job failure. If this the case, user could choose the alternative route to run each step separately as described in alternative route section below
Inputs, parameters, and outputs
Inputs:
The fasta file of reference genome sequence (required)
The two fastq files of paired-end reads of background (parental inbred line) (required)
The two fastq files of paired-end reads of F2 mutants (required)
Parameters:
The default minimal mapQ for consideration is 20
The default minimal baseQ for consideration is 20
Output:
bk.fitered.vcf is the vcf file of filtered SNPs in background in vcf file
F2.fitered.vcf is the vcf file of filtered SNPs in F2 in vcf file
F2.snp.vcf is the vcf file of SNPs unique in F2 in vcf file
F2.snp.vcf.EMS is the vcf file of SNPs induced by EMS in F2
Alternative route
Follow the Discover Variants Using SAM Tools (Workflow Tutorial) workflow (align reads -> Reformat file -> Identify variants -> Filter variants) to generate the filtered vcf files of background and F2 data set respectively
DE App, SAMTOOLS-0.1.19_mpileup_raw-vcf-out, is recommended instead in the "Identify variants" step
DE App, SAMTOOLS-0.1.19_VCF-Utilities_varFilter, is recommended instead in the " Filter variants" step
Filter the F2 SNPs against background SNPs and identify the SNPs induced by EMS via DE apps, EMS_mutation_filter, to get the final result
In case that the same background data set is used in different analysis again, the filtered vcf of the same background data set could be used, no need to be regenerated
Related tutorials
Samtools, Bcftools, and Vcfutils
Discover Variants Using SAM Tools (Workflow Tutorial)
Integrated applications
iPlant validated workflow
Additional applications in DE
CI Enhancements documented
Community members assisted
Zhanguo Xin, Gloria Burow, John Burke (USDA-ARS Lubbock)
Yinping Jiao (CSHL, Ware Lab)
Publications facilitated using this workflow
if you have any
Datasets associated
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> ems_mutation.
Input files:
background_R1.fq
background_R2.fq
F2_R1.fq
F2_R2.fq
Sorghum_chr8.fa
Output files:
bk.fitered.vcf
F2.fitered.vcf
F2.snp.vcf
F2.snp.vcf.EMS
Data Commons Requirements
listing of metadata associations and/or files that should be maintained as a part of data commons effort
Presentations
Presentations associated with this workflow