Programs
Bismark:
Bismark is a tool for mapping bisulfite-converted sequence reads to reference genomes. This allows one to determine the methylation status of individual cytosines throughout the genome relatively quickly. The program is written in perl and accepts reference genomes in fasta format and sequencing reads in either fasta or fastq format. Bismark is integrated into iPlant and runs on iPlant's HPC system.
Dependencies - Bismark depends upon Bowtie or Bowtie 2 for read mapping, which isn't a problem since I'll be using iPlant which also has that. It also requires Perl in order to run.
How to run -
- Prepare the reference genome.
Run the genome preparation script. This bisulfite-converts (C -> T, and G -> A) the reference genome. This is done with the following command:
bismark_genome_preparation \[options\] <path_to_genome_folder>
- Perform the read alignment.
Next Bismark will align the experimentally bisulfite-converted reads to similarly converted regions on the genome:
bismark [options] <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}
- Extract Cytosine methylation status
Now, the methylation status of each cytosine can be output depending on sequence context (CG, CHG, CHH):
bismark_methylation_extractor [options] <filenames>
Through these commands and using the correct options separate files can be output for cytosines in each sequence context. A full description of commands and output formats can be found in the Bismark documentation.
Samtools v1.2:
A suite of tools for working with .sam and .bam files. I'm using the sort function:
samtools sort [options...] [in.bam or .sam]
You can find more information about samtools in the official documentation.
methylKit:
An R package for working with bisulfite sequencing data that can find differentially methylated (hyper- and hypo-methylation) in all three sequence contexts (CG/CpG, CHG, and CHH) using a variety of input-formats. Bismark .sam files must be sorted (see Samtools) to work. It also has a plethora of dependencies, some of which need to be built from source for reasons unknown to me.
- Reading samtools-sorted Bismark .sam files
read.bismark(location, sample.id, assembly, save.folder = NULL, save.context = c("CpG"), read.context = "CpG", nolap = FALSE, mincov = 10, minqual = 20, phred64 = FALSE, treatment)
- Create a methbase object out of your data
methyl.obj=unite(sample_list, destrand=FALSE)
- Find hypermethylated regions
> myDiff = calculateDiffMeth(methyl.obj, num.cores = 8) > myDiff25p.hyper = get.methylDiff(myDiff, difference = 25,qvalue = 0.01, type = "hyper")
Bismark_Sorter:
A custom shell script using sed and sort to process Bismark_Methylation_Extractor files. It removes the header and sorts by chromosome and position. Pass it your file to be processed on the command line like so:
Bismark_Sorter.sh ./placeholder_for_your_Bismark_file.txt
CoGe-ifier for Bismark Methylation Extractor:
A custom python (3.4) script for calculating % methylation on a per-cytosine basis and formatting data for import into CoGe GenomeView as a .csv file. When you run the script it will prompt you for the path of your file to be processed as well as the path and filename for your processed data.