Please work through the documentation and add your comments on the bottom of this page, or email comments to email@example.com. Thank you.
Rationale and Background
CNVnator is a tool for Copy number variation (CNV) discovery and genotyping from depth-of-coverage by mapped reads. CNV in the genome is a complex phenomenon, and not completely understood. CNVnator is a method for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. The method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs.
Some useful information about CNVnator from this blog
CNVnator can identify CNVs from a few 100 bases to megabases in length. Furthermore, the precision is good: 200 bp for 90% of the breakpoints in a test case studied in the CNVnator paper (using a bin size of 100 bp). The higher the coverage you have, the smaller the bin size you can use, which will give you greater precision. They recommend to use ~100-bp bins for 20-30x coverage, ~500-bp bins for 4-6x coverage, and ~30-bp bins for 100x coverage. However, they say that the bin size used shouldn't be shorter than the read length in your data
Custom Reference genome or Reference genome from DE: The user has to select one of this option, otherwise the app will fail
Bam files: Make sure the bam files are the same files that have been generated by mapping to the above selected reference genome
Chromosome id or Chromosome ids from file: Chromosome names must be specified the same way as they are described in bam header, e.g., chrX or X. The user can simply specify a single chromosome id. For example 10 or upload a file that contains multiple chromosome id's one line per chromosome id. The user has to select one of this option, otherwise the app will fail.
Histogram bin size: The bin size (window size) for generating histogram for all the windows in your genome assembly. For example 100
Stat bin size: The bin size (window size) for calculating statistical significance (p-values) for the windows that have unusual read depth. For example 100
Partition bin size: The bin size (window size) for partitioning the chromosomes/scaffolds into long regions (each one of which could be longer than the window size) that have similar read depth, and so presumably similar copy number. For example 100
Call bin size: The bin size (window size) for calling CNV's. For example 100
Prefix: The prefix that will be added to the vcf file column when converting cnvantor to vcf file
The name of the output file: For example result
Test Run using a single chromosome id
All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:
Community Data > iplantcollaborative > example_data > cnvnator (/iplant/home/shared/iplantcollaborative/example_data/cnvnator)
Bam files: IS20351_DS_1_1.sorted.bam and IS20351_DS_2_1.sorted.bam
Chromosome id: chr_list.txt
Histogram bin size: 100
Stat bin size: 100
Partition bin size: 100
Call bin size: 100
The name of the output file: result
Output files generated
cnvnator.root: Output ROOT file. Binary file and so donot try to open it.
result.cnvnator: The final output from CNVnator.
result.vcf: The final output from CNVnator in vcf format
According to the CNVnator README file, the columns of the output file are:
CNV_type coordinates CNV_size normalized_RD e-val1 e-val2 e-val3 e-val4 q0
normalized_RD -- normalized to 1.
e-val1 -- is calculated using t-test statistics.
e-val2 -- is from the probability of RD values within the region to be in
the tails of a gaussian distribution describing frequencies of RD values in bins.
e-val3 -- same as e-val1 but for the middle of CNV
e-val4 -- same as e-val2 but for the middle of CNV
q0 -- fraction of reads mapped with q0 quality