EMS mutant sites identification Workflow

EMS mutatnt sites identification Workflow (DE App: ems_mutation)

Conceptual overview

This workflow is to identify the EMS induced mutant sites by whole genome sequencing.

  • Use bowtie2 (bowtie, bwa) and samtools for variants calling of the parental inbred line, and F2 mutants population and individual mutants
  • Filter SNPs by discarding background SNPs and read depth and allele frequency
  • All steps are bundled into single DE App, ems_mutation
  • Due to computation time constraint, a large data set might not be able to finished in time and cause the job failure. If this the case, user could choose the alternative route to run each step separately as described in alternative route section below  

Inputs, parameters, and outputs

  • Inputs:
    • The fasta file of reference genome sequence (required)
    • The two fastq files of paired-end reads of background (parental inbred line) (required)
    • The two fastq files of paired-end reads of F2 mutants (required)
  • Parameters:
    • The default minimal mapQ for consideration is 20
    • The default minimal baseQ for consideration is 20
  • Output:
    • bk.fitered.vcf is the vcf file of filtered SNPs in background in vcf file
    • F2.fitered.vcf is the vcf file of filtered SNPs in F2 in vcf file
    • F2.snp.vcf is the vcf file of SNPs unique in F2 in vcf file
    • F2.snp.vcf.EMS is the vcf file of SNPs induced by EMS in F2

Alternative route

  • Follow the Discover Variants Using SAM Tools (Workflow Tutorial) workflow (align reads -> Reformat file -> Identify variants -> Filter variants) to generate the filtered vcf files of background and F2 data set respectively
    • DE App, SAMTOOLS-0.1.19_mpileup_raw-vcf-out, is recommended instead in the "Identify variants" step
    • DE App, SAMTOOLS-0.1.19_VCF-Utilities_varFilter, is recommended instead in the " Filter variants" step
  • Filter the F2 SNPs against background SNPs and identify the SNPs induced by EMS via DE apps, EMS_mutation_filter, to get the final result

    Please make sure the background and F2 filtered vcf files have different file names. If not, please rename one of them (the background filtered vcf file is recommended one, since it would be reused likely)

  • In case that the same background data set is used in different analysis again, the filtered vcf of the same background data set could be used, no need to be regenerated  

Related tutorials


Samtools, Bcftools, and Vcfutils

Discover Variants Using SAM Tools (Workflow Tutorial)

Integrated applications

iPlant validated workflow

Additional applications in DE

CI Enhancements documented

Community members assisted

  • Zhanguo Xin, Gloria Burow, John Burke (USDA-ARS Lubbock)
  • Yinping Jiao (CSHL, Ware Lab)

Publications facilitated using this workflow

if you have any

Datasets associated

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> ems_mutation.

  • Input files:
    • background_R1.fq
    • background_R2.fq
    • F2_R1.fq
    • F2_R2.fq
    • Sorghum_chr8.fa
  • Output files:
    • bk.fitered.vcf
    • F2.fitered.vcf
    • F2.snp.vcf
    • F2.snp.vcf.EMS

Data Commons Requirements

listing of metadata associations and/or files that should be maintained as a part of data commons effort


Presentations associated with this workflow

Unable to render {include} The included page could not be found.