CNVnator-0.3.3

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.

Rationale and Background

CNVnator is a tool for Copy number variation (CNV) discovery and genotyping from depth-of-coverage by mapped reads. CNV in the genome is a complex phenomenon, and not completely understood. CNVnator is a method for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. The method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs.

Some useful information about CNVnator from this blog

CNVnator can identify CNVs from a few 100 bases to megabases in length. Furthermore, the precision is good: 200 bp for 90% of the breakpoints in a test case studied in the CNVnator paper (using a bin size of 100 bp). The higher the coverage you have, the smaller the bin size you can use, which will give you greater precision. They recommend to use ~100-bp bins for 20-30x coverage, ~500-bp bins for 4-6x coverage, and ~30-bp bins for 100x coverage. However, they say that the bin size used shouldn't be shorter than the read length in your data

Mandatory arguments

Input(s)
- Custom Reference genome or Reference genome from DE: The user has to select one of this option, otherwise the app will fail
- Bam files: Make sure the bam files are the same files that have been generated by mapping to the above selected reference genome
- Chromosome id or Chromosome ids from file: Chromosome names must be specified the same way as they are described in bam header, e.g., chrX or X. The user can simply specify a single chromosome id. For example 10 or upload a file that contains multiple chromosome id's one line per chromosome id. The user has to select one of this option, otherwise the app will fail.
Parameters(s)
- Histogram bin size: The bin size (window size) for generating histogram for all the windows in your genome assembly. For example 100
- Stat bin size: The bin size (window size) for calculating statistical significance (p-values) for the windows that have unusual read depth. For example 100
- Partition bin size: The bin size (window size) for partitioning the chromosomes/scaffolds into long regions (each one of which could be longer than the window size) that have similar read depth, and so presumably similar copy number. For example 100
- Call bin size: The bin size (window size) for calling CNV's. For example 100
- Prefix: The prefix that will be added to the vcf file column when converting cnvantor to vcf file
Output
- The name of the output file: For example result

Test Run using a single chromosome id

All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > cnvnator (/iplant/home/shared/iplantcollaborative/example_data/cnvnator)

Mandatory arguments

Input(s)
- Custom Reference genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel.fa
- Bam files: IS20351_DS_1_1.sorted.bam and IS20351_DS_2_1.sorted.bam
- Chromosome id: 10
Parameters(s)
- Histogram bin size: 100
- Stat bin size: 100
- Partition bin size: 100
- Call bin size: 100
- Prefix: test
Output
- The name of the output file: result

Test Run using a chromosome id file

All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > cnvnator (/iplant/home/shared/iplantcollaborative/example_data/cnvnator)

Mandatory arguments

Input(s)
- Custom Reference genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel.fa
- Bam files: IS20351_DS_1_1.sorted.bam and IS20351_DS_2_1.sorted.bam
- Chromosome id: chr_list.txt
Parameters(s)
- Histogram bin size: 100
- Stat bin size: 100
- Partition bin size: 100
- Call bin size: 100
- Prefix: test
Output
- The name of the output file: result

Output files generated

cnvnator.root: Output ROOT file. Binary file and so donot try to open it.
result.cnvnator: The final output from CNVnator.
result.vcf: The final output from CNVnator in vcf format

According to the CNVnator README file, the columns of the output file are:

CNV_type coordinates CNV_size normalized_RD e-val1 e-val2 e-val3 e-val4 q0

normalized_RD -- normalized to 1.
e-val1        -- is calculated using t-test statistics.
e-val2        -- is from the probability of RD values within the region to be in
the tails of a gaussian distribution describing frequencies of RD values in bins.
e-val3        -- same as e-val1 but for the middle of CNV
e-val4        -- same as e-val2 but for the middle of CNV
q0            -- fraction of reads mapped with q0 quality

Mor information about CNVnator can be found here - https://github.com/abyzovlab/CNVnator

Discovery Environment Applications List

CNVnator-0.3.3

Rationale and Background

Mandatory arguments

Test Run using a single chromosome id

Mandatory arguments

Test Run using a chromosome id file

Mandatory arguments

Output files generated

Related content