EMS mutant sites identification Workflow
EMS mutatnt sites identification Workflow (DE App:Â ems_mutation)
Conceptual overview
This workflow is to identify the EMS induced mutant sites by whole genome sequencing.
- Use bowtie2 (bowtie, bwa) and samtools for variants calling of the parental inbred line, and F2 mutants population and individual mutants
- Filter SNPs by discarding background SNPs and read depth and allele frequency
- All steps are bundled into single DE App, ems_mutation
- Due to computation time constraint, a large data set might not be able to finished in time and cause the job failure. If this the case, user could choose the alternative route to run each step separately as described in alternative route section below Â
Inputs, parameters, and outputs
- Inputs:
- The fasta file of reference genome sequence (required)
- The two fastq files of paired-end reads of background (parental inbred line) (required)
- The two fastq files of paired-end reads of F2 mutants (required)
- Parameters:
- The default minimal mapQ for consideration is 20
- The default minimal baseQ for consideration is 20
- Output:
- bk.fitered.vcf is the vcf file of filtered SNPs in background in vcf file
- F2.fitered.vcf is the vcf file of filtered SNPs in F2 in vcf file
- F2.snp.vcf is the vcf file of SNPs unique in F2 in vcf file
- F2.snp.vcf.EMS is the vcf file of SNPs induced by EMS in F2
Alternative route
- Follow the Discover Variants Using SAM Tools (Workflow Tutorial) workflow (align reads -> Reformat file -> Identify variants -> Filter variants) to generate the filtered vcf files of background and F2 data set respectively
- DE App, SAMTOOLS-0.1.19_mpileup_raw-vcf-out, is recommended instead in the "Identify variants" step
- DE App, SAMTOOLS-0.1.19_VCF-Utilities_varFilter, is recommended instead in the "Â Filter variants" step
Filter the F2 SNPs against background SNPs and identify the SNPs induced by EMS via DE apps, EMS_mutation_filter, to get the final result
Please make sure the background and F2 filtered vcf files have different file names. If not, please rename one of them (the background filtered vcf file is recommended one, since it would be reused likely)
- In case that the same background data set is used in different analysis again, the filtered vcf of the same background data set could be used, no need to be regenerated Â
Related tutorials
Samtools, Bcftools, and Vcfutils
Discover Variants Using SAM Tools (Workflow Tutorial)
Integrated applications
iPlant validated workflow
Additional applications in DE
CI Enhancements documented
Community members assisted
- Zhanguo Xin, Gloria Burow, John Burke (USDA-ARS Lubbock)
- Yinping Jiao (CSHL, Ware Lab)
Publications facilitated using this workflow
if you have any
Datasets associated
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> ems_mutation.
- Input files:
- background_R1.fq
- background_R2.fq
- F2_R1.fq
- F2_R2.fq
- Sorghum_chr8.fa
- Output files:
- bk.fitered.vcf
- F2.fitered.vcf
- F2.snp.vcf
- F2.snp.vcf.EMS
Data Commons Requirements
listing of metadata associations and/or files that should be maintained as a part of data commons effort
Presentations
Presentations associated with this workflow