Update - 4-7-15
Update - 4/7/15
- Following Eric's suggestions I decided to try downloading my Bismark Methylation Extractor Summaries and raw data over UA's network. I didn't grab all the files I needed last time anyway. Yeah, that's a lot faster. Also, just FYI for anyone else who needs to use icommands to download files from the data store (since it's the best option for large files), put the icommands files in your $PATH (somewhere like /usr/local/bin). You'll need superuser permissions to copy files there, so either:
sudo cp (files) ($PATH directory)
or
sudo (whatever your file manager is)
or you can export a different directory to your path. It's probably best to be consistent about this so you don't end up with tons of random directories in your $PATH.
There are a bunch of commands you can use in icommands. I'm using some options to enable resuming my file download in case they fail. For reasons unknown to me you need the -r option for downloading directories (recursive?). Many of these options are explained in the iplant wiki. Several linux commands like ls have icommands equivalents (ils, etc).
Downloading my data:
iget -P -T -r --retries 10 -X iplant_restartfile --lfrestart iplant_lfrestartfile Methylation_Extractor_Output
- I also installed the Methylkit R package on my laptop that I keep at school most of the time. That way I don't have to wait to get home. I ran into some strange issues with dependencies not being available, but I was able to overcome that my building from source. I know little to nothing about how that's actually done, but I can type in "make _____" like the best of them. Also, it's pretty essential to enable source code repos, and Mint 17.1 for some strange reason disables them by default.
- Because the fun doesn't stop there I found that in order for MethylKit to do the %MethylC calculation the .sam files output by Bismark need to be sorted first. When I had started it on my computer before I did not do this, it isn't stated anywhere really obvious. So, I also installed samtools, a program that can do this for you. This also needed to be built from source and put in your $PATH. The following samtools command will sort the .sam file for use with methylkit:
samtools sort -o bismark_output-sorted.sam -O sam -T .tmp -@ 3 bismark_output.sam
- I also wrote a short shell script to process and sort the Bismark Methylation Extractor summaries. This should make it easier to get them into CoGe for viewing as it will likely simplify my python script if things are already sorted in a logical manner by chromosome # and position (will go on github as soon as I've tested it):
#!/bin/bash sed '1d' "$1" | \ sort -k3,3 -k4,4n > "${1%.txt}-processed.txt"
- Becky informed me that I should've done some sort of QC check on my reads before all of this. So, I also ran fastqc in the iplant DE. In the interest of keeping this short it looks like several different experiments were combined together in the SRA and the read lengths are not all the same.
To-do:
- Process all my Methylation Extractor output files. Due 4/11/15
- Run Methylkit. Due 4/12/15
- Python script for CoGe input. Due Date 4/14/15
, multiple selections available,