Update - 4-21-15
Update - 4/21/15
- I've been working on my script to import Bismark's Methylation Extractor ouput file into CoGe.
- The program provides separate files for each sequence context (CG, CHG, and CHH).
- After stripping out the header and sorting by chromosome and position this is essentially the format (tab delimited):
- What I need is an output in this format for loading into CoGe (csv or tsv):
Chr#,Position,% Methylation (as decimal),# of methylated out of total reads
- I played around with this for a while, but didn't get very far. Progress is on my github.
Rough Pseudocode:
Read in tsv Make a dictionary Iterate over the rows in the tsv and find entries for which chromosome position (col 3 and 4) is the same, store this as a tuple (Chr#, Position). For each item row with the same position count the number that have a + and a - in column 2, divide count of + by the total. Store the fraction of methylated (+) reads and total number of reads for that position as a list [methylated, total] Append each to dictionary, the tuple will be the key (immuatable), list will be the value. Write dictionary to output file
Does this sound like a reasonable plan? Ideas welcome.
, multiple selections available,