Update - 3-10-15

Update - 3-10-15

Update 3/10/15:
  • I had some issues uploading my bisulfite sequencing reads to iPlant. However, I was able to get them uploaded. I wasn't able to use a web link to get them on the data store because the SRA website did not have a direct link to the fastq-converted versions of the files. They had ftp links to the .sra formatted files, but I figured it would be better to upload them in fastq format so they could be mapped directly with no additional conversion steps. I actually had to use the web-based bulk upload tool on my windows desktop since it didn't play nice with my linux machines. I also tried the iDrop desktop client, but it failed every time despite reaching 100% uploaded.
  • I started running the Bismark genome preparation script on the Arabidopsis thaliana genome. This has to be done first and then the reads will be mapped to this. I submitted it yesterday, but it hasn't started running yet.

  • Some testing has been done with test datasets and I'm trying to think about how I'm going to find these hypermethylated regions in the output of Bismark. It will output a SAM file in the following format:
(1) QNAME (seq-ID)
(2) FLAG (this flag tries to take the strand a bisulfite read
originated from into account (this is different from ordinary DNA
alignment flags!))
(3) RNAME (chromosome)
(4) POS (start position)
(5) MAPQ (only calculated for Bowtie 2, always 255 for Bowtie)
(6) CIGAR
(7) RNEXT
(8) PNEXT
(9) TLEN
(10) SEQ
(11) QUAL (Phred33 scale)
(12) NM-tag (edit distance to the reference)
(13) MD-tag (base-by-base mismatches to the reference)
(14) XM-tag (methylation call string)
(15) XR-tag (read conversion state for the alignment)
(16) XG-tag (genome conversion state for the alignment)
  • However, depending on compute time it would probably be easiest to run the methylation extractor script and parse its output. It would provide separate files for each sequence context (CG, CHG, CHH):
(1) seq-ID
(2) methylation state
(3) chromosome
(4) start position (= end position)
(5) methylation call
  • HPC access works. I have no problems logging in through ssh either on campus or at home. This will be necessary to run my HMR-finder script, which will be written in python.
  • Question for Eric/Shelley:
  1. When will my iPlant HPC jobs actually start?
To-do This Week:
  1. Pseudocode/Start Coding my HMR-finder script - Need to think hard about how I define an interesting region, how many methyl-Cs, and how close together
  2. Continue doing the necessary computation steps in iPlant. It should be straightforward although I don't know how long this will take.