Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

2.1.1 After logging into Discovery Environment, click on the app window and in the search box, enter snpeff. You will see two apps - SnpEff-4.3.1 and SnpEff-build-4.3.1

Image RemovedImage Added

Click on the SnpEff-4.3.1 app and enter "SnpEff-4.3.1_analysis1_Arabidopis" under the Analysis Name

...

  1. Use Arabidopsis_annotated.vcf for Output File Name and then click Launch Analysis


After a successful run you should the output file Arabidopsis_annotated.vcf that contains annotations of the vcf file. Here is are the first few lines of the annotated vcf file.

...

As you can see, SnpEff added functional annotations in the ANN info field (eigth eighth column in the VCF output file). Details about the 'ANN' field format can be found in the ANN Field section.

Note

Note: Older SnpEff version used 'EFF' field (details about the 'EFF' field format can be found in the EFF Field section). 

...

You need to create our own config-file for SnpEff if your genome is not in the list of the database in this file snpeff_databases.csv. If your genome has already has had a database, you can skip to running SnpEff step

...

All files are located for Basic example is located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > snpEff > Custom_database (/iplant/home/shared/iplantcollaborative/example_data/snpEff/Custom_database)

2.2.1.1 After logging into Discovery Environment, click on the app window and in the search box, enter snpeff. You will see two apps - SnpEff-4.3.1 and SnpEff-build-4.3.1

Click on the SnpEff-build-4.3.1 app and enter "SnpEff-4.3.1_analysis1_brassica.v_2" under the Analysis Name.

Warning

Make sure you check the box that says "Retain Inputs? Enabling this flag will copy all the input files into the analysis results folder." Otherwise the build doesn't work

Inputs:

  1. Use snpEff.config as an input for config file
Note

In order to tell SnpEff that there is a new genome available, you must update SnpEff's configuration file snpEff.config. You must add a new genome entry to snpEff.config
If your genome, or a chromosome, uses non-standard codon tables you must update snpEff.config accordingly. A typical case is when you use mitochondrial DNA. Then you specify that chromosome 'MT' uses codon.Invertebrate_Mitochondrial codon table. Another common case is when you are adding a bacterial genome, then you specify that the codon table is Bacterial_and_Plant_Plastid

Note

This config file snpEff.config has the custom genome (brassica_v.2) added to the configuration file. If you want to add your own custom genome, then you can download this snpEff.config file and edit this file to add the following two lines after #Non-standard Database section. Here is an example

 #---

# Non-standard Databases
#---

 # My Brassica genome

brassica.v_2.genome = BrassicaRapa

     2. Use brassica.v_2 as Input folder 

Note

This config file snpEff.config has the custom genome (brassica_v.2) added to the configuration file. If you want to add your own custom genome, then you can download this snpEff.config file and put your genome after #Non-standard Database section. Here is an example

 #---

# Non-standard Databases
#---

 # My Brassica genome

brassica.v_2.genome = BrassicaRapa

What are the contents in brassica.v_2 folder? If you look inside this folder, you will find only two files. Compressed genome fasta file and renamed as sequences.fa.gz and compressed genome annotation gff3 file renamed as genes.gff.gz. In addition, the folder name  brassica.v_2 should match the first of the name in the config file. For example brassica.v_2.genome

   3. Finally Genome name in the config name should be brassica.v_2. Again make sure that this name should match the names in the config file and input folder. Then click Launch Analysis button

After successful completion of the build, you will get three outputs: 

  1. brassica.v_2 folder that contains sequences.fa.gz, genes.gff.gz and snpEffectPredictor.bin files
  2. logs folder
  3. snpEff.config file

We need brassica.v_2 and snpEff.config for the next step

2.2.1.2 Run snpeff using the custom build

  1. Click on the SnpEff-4.3.1 app and enter "SnpEff-4.3.1_analysis1_brassica" under the Analysis Name
  2. Inputs:
    1. Use snpEff.config Config file from above step
    2. Use brassica.v_2 for Database Name
    3. Use brassica.v_2 folder from above step
    4. Use Brassica_rapa.vcf.gz as input vcf file
  3. Outputs:
    1. Use brassica_annotated.vcf as output vcf file 
Launch analysis

After successful completion of snpeff analysis, you should get the brassica_annotated.vcf that contains the annotated vcfs for our custom database (not present in snpeff's database)

Tool Source for App

http://snpeff.sourceforge.net/index.html