CNVnator-0.3.3
upendra kumar Devisetty
Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.
Rationale and Background
Some useful information about CNVnator from this blog
CNVnator can identify CNVs from a few 100 bases to megabases in length. Furthermore, the precision is good: 200 bp for 90% of the breakpoints in a test case studied in the CNVnator paper (using a bin size of 100 bp). The higher the coverage you have, the smaller the bin size you can use, which will give you greater precision. They recommend to use ~100-bp bins for 20-30x coverage, ~500-bp bins for 4-6x coverage, and ~30-bp bins for 100x coverage. However, they say that the bin size used shouldn't be shorter than the read length in your data
Mandatory arguments
- Input(s)
- Custom Reference genome or Reference genome from DE: The user has to select one of this option, otherwise the app will fail
- Bam files: Make sure the bam files are the same files that have been generated by mapping to the above selected reference genome
- Chromosome id or Chromosome ids from file: Chromosome names must be specified the same way as they are described in bam header, e.g., chrX or X. The user can simply specify a single chromosome id. For example 10 or upload a file that contains multiple chromosome id's one line per chromosome id. The user has to select one of this option, otherwise the app will fail.
- Parameters(s)
- Histogram bin size: The bin size (window size) for generating histogram for all the windows in your genome assembly. For example 100
- Stat bin size: The bin size (window size) for calculating statistical significance (p-values) for the windows that have unusual read depth. For example 100
- Partition bin size: The bin size (window size) for partitioning the chromosomes/scaffolds into long regions (each one of which could be longer than the window size) that have similar read depth, and so presumably similar copy number. For example 100
- Call bin size: The bin size (window size) for calling CNV's. For example 100
- Prefix: The prefix that will be added to the vcf file column when converting cnvantor to vcf file
- Output
- The name of the output file: For example result
Test Run using a single chromosome id
All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:
Community Data > iplantcollaborative > example_data > cnvnator (/iplant/home/shared/iplantcollaborative/example_data/cnvnator)
Mandatory arguments
- Input(s)
- Custom Reference genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel.fa
- Bam files: IS20351_DS_1_1.sorted.bam and IS20351_DS_2_1.sorted.bam
- Chromosome id: 10
- Parameters(s)
- Histogram bin size: 100
- Stat bin size: 100
- Partition bin size: 100
- Call bin size: 100
- Prefix: test
- Output
- The name of the output file: result
Test Run using a chromosome id file
All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:
Community Data > iplantcollaborative > example_data > cnvnator (/iplant/home/shared/iplantcollaborative/example_data/cnvnator)
Mandatory arguments
- Input(s)
- Custom Reference genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel.fa
- Bam files: IS20351_DS_1_1.sorted.bam and IS20351_DS_2_1.sorted.bam
- Chromosome id: chr_list.txt
- Parameters(s)
- Histogram bin size: 100
- Stat bin size: 100
- Partition bin size: 100
- Call bin size: 100
Prefix: test
- Output
- The name of the output file: result
Output files generated
- cnvnator.root: Output ROOT file. Binary file and so donot try to open it.
- result.cnvnator: The final output from CNVnator.
- result.vcf: The final output from CNVnator in vcf format
According to the CNVnator README file, the columns of the output file are:
CNV_type coordinates CNV_size normalized_RD e-val1 e-val2 e-val3 e-val4 q0 normalized_RD -- normalized to 1. e-val1 -- is calculated using t-test statistics. e-val2 -- is from the probability of RD values within the region to be in the tails of a gaussian distribution describing frequencies of RD values in bins. e-val3 -- same as e-val1 but for the middle of CNV e-val4 -- same as e-val2 but for the middle of CNV q0 -- fraction of reads mapped with q0 quality
Mor information about CNVnator can be found here - https://github.com/abyzovlab/CNVnator