...
2.1.1 After logging into Discovery Environment, click on the app window and in the search box, enter snpeff. You will see two apps - SnpEff-4.3.1 and SnpEff-build-4.3.1
Click on the SnpEff-4.3.1 app and enter "SnpEff-4.3.1_analysis1_Arabidopis" under the Analysis Name
...
- Use Arabidopsis_annotated.vcf for Output File Name and then click Launch Analysis
After a successful run you should the output file Arabidopsis_annotated.vcf that contains annotations of the vcf file. Here is are the first few lines of the annotated vcf file.
...
As you can see, SnpEff added functional annotations in the ANN
info field (eigth eighth column in the VCF output file). Details about the 'ANN' field format can be found in the ANN Field section.
Note |
---|
Note: Older SnpEff version used 'EFF' field (details about the 'EFF' field format can be found in the EFF Field section). |
...
You need to create our own config-file for SnpEff if your genome is not in the list of the database in this file snpeff_databases.csv. If your genome has already has had a database, you can skip to running SnpEff step
...
All files are located for Basic example is located in the Community Data directory of the CyVerse Discovery Environment at the following path:
Community Data > iplantcollaborative > example_data > snpEff > Custom_database (/iplant/home/shared/iplantcollaborative/example_data/snpEff/Custom_database)
2.2.1.1 After logging into Discovery Environment, click on the app window and in the search box, enter snpeff. You will see two apps - SnpEff-4.3.1 and SnpEff-build-4.3.1
Click on the SnpEff-build-4.3.1 app and enter "SnpEff-4.3.1_analysis1_brassica.v_2" under the Analysis Name.
Warning |
---|
Make sure you check the box that says "Retain Inputs? Enabling this flag will copy all the input files into the analysis results folder." Otherwise the build doesn't work |
Inputs:
- Use snpEff.config as an input for config file
Note |
---|
In order to tell SnpEff that there is a new genome available, you must update SnpEff's configuration file |
Note |
---|
This config file snpEff.config has the custom genome (brassica_v.2) added to the configuration file. If you want to add your own custom genome, then you can download this snpEff.config file and edit this file to add the following two lines after #Non-standard Database section. Here is an example #--- # Non-standard Databases # My Brassica genome brassica.v_2.genome = BrassicaRapa |
2. Use brassica.v_2 as Input folder
Note |
---|
This config file snpEff.config has the custom genome (brassica_v.2) added to the configuration file. If you want to add your own custom genome, then you can download this snpEff.config file and put your genome after #Non-standard Database section. Here is an example #--- # Non-standard Databases # My Brassica genome brassica.v_2.genome = BrassicaRapa |
What are the contents in brassica.v_2 folder? If you look inside this folder, you will find only two files. Compressed genome fasta file and renamed as sequences.fa.gz and compressed genome annotation gff3 file renamed as genes.gff.gz. In addition, the folder name brassica.v_2 should match the first of the name in the config file. For example brassica.v_2.genome
3. Finally Genome name in the config name should be brassica.v_2. Again make sure that this name should match the names in the config file and input folder. Then click Launch Analysis button
After successful completion of the build, you will get three outputs:
- brassica.v_2 folder that contains sequences.fa.gz, genes.gff.gz and snpEffectPredictor.bin files
- logs folder
- snpEff.config file
We need brassica.v_2 and snpEff.config for the next step
2.2.1.2 Run snpeff using the custom build
- Click on the SnpEff-4.3.1 app and enter "SnpEff-4.3.1_analysis1_brassica" under the Analysis Name
- Inputs:
- Use snpEff.config Config file from above step
- Use brassica.v_2 for Database Name
- Use brassica.v_2 folder from above step
- Use Brassica_rapa.vcf.gz as input vcf file
- Outputs:
- Use brassica_annotated.vcf as output vcf file
After successful completion of snpeff analysis, you should get the brassica_annotated.vcf that contains the annotated vcfs for our custom database (not present in snpeff's database)