Bismark

  • How to download the tool or source code including installation and usage instructions as well as any source code that might be associated with the executable. This should also include a listing of any dependencies for this tool or script.
  • Required version of the program necessary to perform the desired task
    • bismark v0.2.4
  • Sample dataset and expected results to be output
    • Excerpt from sample input files, chr8.fa (fasta file, reference sequence), and test_reads_bs.fq (fastq file, short reads)
      chr8.fa:
      >chr8
      GCAATTATGACACAAAAAATTAAACAGTGCAGACTGATATATAAATCAAA
      ACAAATGTCCTTTACATGTTTTCTGTTACAGTAGTAACAATATGTGTAAA
      CTTAATTATCATATTTTTTTCTTGTGCTGTGGTTGTGTCCTGGGTTCATT
      CTCTAAAATGCTGTTCACCTTAGACCAGGAGAAATATTAACCATACAGAC
      TCTGTTTCAAGTCATAGCTGAATATTTTCAAAAGAGTGACTTTGTAAAAA
      CATGTTCCAATGGCAAATTGATTCATTGTGATGGGATCAATTATTCCAAA
      GACTTCTTGTCTTTATTTTGTTCCCATGCCTACCTTTTAGCCATAATACA
      test_reads_bs.fq:
      @chr8:144-169_1_0000000000000000000000000_0
      TTTATTTTTTAAAATGTTGTTTATT
      \+chr8:144-169_1_0000000000000000000000000_0
      OhhhKhhhhLhhhhhhhhhhRhhhh
      @chr8:440-465_1_0000000000000000000000000_0
      TATAATGTTTTTTAAAATAAAAGAG
      \+chr8:440-465_1_0000000000000000000000000_0
      QhhhhhhhhhhhhOhHhhhhhhhhh
      @chr8:1759-1784_0_0000000000000000000000000_0
      TTGTAGGTTATTGAGGAAGGTGAGG
      \+chr8:1759-1784_0_0000000000000000000000000_0
      XhNh\[ZhhhhhhQThhKRhhhhhhh
    • Excerpt from sample output files, test_reads_bs.fq_bismark.txt, and CpG_context_test_reads_bs.fq_bismark.txt
      • Single-end output format (tab-separated):
        1. <seq-ID>
        2. <read alignment strand>
        3. <chromosome>
        4. <start position>
        5. <end position>
        6. <observed bisulfite sequence>
        7. <equivalent genomic sequence>
        8. <methylation call>
        9. <read conversion
        10. <genome conversion>
      • Paired-end output format (tab-separated):
        1. <seq-ID>
        2. <read 1 alignment strand>
        3. <chromosome>
        4. <start position>
        5. <end position>
        6. <observed bisulfite sequence 1>
        7. <equivalent genomic sequence 1>
        8. <methylation call 1>
        9. <observed bisulfite sequence 2>
        10. <equivalent genomic sequence 2>
        11. <methylation call 2>
        12. <read 1 conversion
        13. <genome conversion>
          test_reads_bs.fq_bismark.txt:
          Bismark version: v0.2.4
          chr8:144-169_1_0000000000000000000000000_0      +       chr8    145     169     TTTATTTTTTAAAATGTTGTTTATT       TTCATTCTCTAAAATGCTGTTCACCTT     ..h...h.h.......x....h.hh       CT      CT
          chr8:440-465_1_0000000000000000000000000_0      +       chr8    441     465     TATAATGTTTTTTAAAATAAAAGAG       CACAATGCTTTCTAAAACAAAAGAGTC     h.h....h...h.....h.......       CT      CT
          CpG_context_test_reads_bs.fq_bismark.txt:
          Bismark methylation extractor version v0.2.4
          chr8:3234-3259_0_0000000000000000000000000_0    -       chr8    3254    z
          chr8:3577-3602_1_0000000000000000000000000_0    -       chr8    3579    z
          chr8:1086-1111_1_0000000000000000000000000_0    -       chr8    1101    z
          chr8:3216-3241_1_0000000000000000000000000_0    -       chr8    3231    z
  • Set of parameters and command line switches that match the expected execution of the tool including the possible command line definitions according to the occurrence of optional parameters. Also, validation instructions for parameters are requested.
    • Running the Bismark genome preparation
      • USAGE:
        bismark_genome_preparation [options] <arguments>
      • OPTIONS:

        parameter

        brief description of the parameter

        required

        default value

        text, number, or file/path

        description of validation rules

        --help/--man

        Displays this help file

        N

        none

         

         

        --verbose

        Print verbose output for more details or debugging

        N

        none

         

         

        --path_to_bowtie

        The full path to the bowtie installation on your system

        N

        none

         

         

        --yes/--yes_to_all

        Answer yes to safety related questions

        N

        none

         

         

      • ARGUMENTS:

        argument

        brief description of the argument

        required

        default value

        text, number, or file/path

        path_to_genome_folder

        The full path to the folder containing the genome to be bisulfite converted

        Y

        none

        path

    • Running Bismark
      • USAGE:
        bismark [options] <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}
      • OPTIONS:

        parameter

        brief description of the parameter

        required

        default value

        text, number, or file/path

        description of validation rules

        -q/--fastq

        The query input files (specified as <mate1>,<mate2> or <singles> are FASTQ files

        Y for FASTQ input

        none

         

         

        -f/--fasta

        The query input files (specified as <mate1>,<mate2> or <singles> are FASTA files. All quality values are assumed to be 40 on the Phred scale

        Y for FASTA input

        none

         

         

        -s/--skip

        Skip the first <int> reads or read pairs from the input

        N

        0

        integer

        >=0

        -u/--qupto

        Only aligns the first <int> reads or read pairs from the input

        N

        none

        integer

        >=0

        --phred33-quals

        FASTQ qualities are ASCII chars equal to the Phred quality plus 33

        N

        Y

         

         

        --phred64-quals

        FASTQ qualities are ASCII chars equal to the Phred quality plus 64

        N

        N

         

         

        --solexa-quals

        Convert FASTQ qualities from solexa-scaled (which can be negative) to phred-scaled

        N

        N

         

         

        --solexa1.3-quals

        Same as --phred64-quals

        N

        N

         

         

        --path_to_bowtie

        The full path to the bowtie installation on your system

        N

        none

         

         

        -n/--seedmms

        The maximum number of mismatches permitted in the "seed" (see -l/--seedlen)

        N

        0

        integer

        0, 1, 2 or 3

        -l/--seedlen

        The "seed length"; i.e., the number of bases of the high quality end of the read to which the -n ceiling applies

        N

        28

        integer

        >=0

        -e/--maqerr

        Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed"

        N

        70

        integer

        >=0

        --chunkmbs

        The number of megabytes of memory a given thread is given to store path descriptors in --best mode. Best-first search must keep track of many paths at once to ensure it is always extending the path with the lowest cumulative cost. Bowtie tries to minimize the memory impact of the descriptors, but they can still grow very large in some cases. If you receive an error message saying that chunk memory has been exhausted in --best mode, try adjusting this parameter up to dedicate more memory to the descriptors

        N

        64

        integer

        >=0

        -I/--minins

        The minimum insert size for valid paired-end alignments

        N

        0

        integer

        >=0

        -X/--maxins

        The maximum insert size for valid paired-end alignments

        N

        250

        integer

        >=0

        --best

        Make Bowtie guarantee that reported singleton alignments are "best" in terms of stratum

        N

        Y

         

         

        --no_best

        Disables the --best option which is on by default. This can speed up the alignment process, e.g. for testing purposes, but for credible results it is not recommended to disable --best

        N

        none

         

         

        --directional

        The user may specify if the sequencing library was constructed in a strand-specific manner. In this case the strands complementary to the original strands are merely theoretical and should not exist in reality. Thus, specifying --direction will only report alignments to the original top or bottom strands. This is the recommended option for sprand-specific libraries

        N

        none

         

         

        --quiet

        Print nothing besides alignments

        N

        none

         

         

        -h/--help

        Displays help file

        N

        none

         

         

        -v/--version

        Displays version information

        N

        none

         

         

      • ARGUMENTS:

        argument

        brief description of the argument

        required

        default value

        text, number, or file/path

        genome_folder

        The full path to the folder containing the unmodified reference genome as well as the subfolders created by the Bismark_Genome_Preparation script

        Y

        none

        path

        -1

        Comma-separated list of files containing the #1 mates

        Y for paired-end read

        none

        files

        -2

        Comma-separated list of files containing the #2 mates

        Y for paired-end read

        none

        files

        singles

        A comma-separated list of files containing the reads to be aligned

        Y for single-end read

        none

        files

    • Running the methylation extractor
      • USAGE:
        methylation_extractor [options] <filenames>
      • OPTIONS:

        parameter

        brief description of the parameter

        required

        default value

        text, number, or file/path

        description of validation rules

        -s/--single-end

        Input file(s) are Bismark result file(s) generated from single-end read data

        Y for single-end read

        none

         

         

        -p/--paired-end

        Input file(s) are Bismark result file(s) generated from paired-end read data

        Y for paired-end read

        none

         

         

        --no_overlap

        For paired-end reads it is theoretically possible that read_1 and read_2 overlap. This option avoids scoring overlapping methylation calls twice

        N

        none

         

         

        --fasta

        Chosing this option will print out the genomic sequences that correspond to the bisulfite mapped reads in FastA format

        N

        none

         

         

        --ignore

        Ignore the first <int> bp when processing the methylation call string

        N

        0

        integer

        >= 0

        --comprehensive

        Specifying this option will merge all four possible strand-specific methylation info into context-dependent output files

        N

        none

         

         

        --merge_non_CpG

        This will produce two output files (in --comprehensive mode) or eight strand-specific output files (default) for Cs in (i) CpG context (ii) any non-CpG context

        N

        none

         

         

        --report

        Prints out a short methylation summary and the paramaters used to run this script

        N

        none

         

         

        --version

        Displays version information

        N

        none

         

         

        -h/--help

        Displays this help file and exits

        N

        none

         

         

      • ARGUMENTS:

        argument

        brief description of the argument

        required

        default value

        text, number, or file/path

        filenames

        A space-separated list of result files in Bismark format

        Y

        none

        files

  • Example invocation of the command line application and its associated parameters such that it can perform an analysis
    • Running the Bismark genome preparation
      ~/bin/bismark_v0.2.4/bismark_genome_preparation --verbose --path_to_bowtie ~/bin/bowtie-0.12.7/ ~/sequence/test/
    • Running Bismark
      single-end:
      ~/bin/bismark_v0.2.4/bismark -q --phred64-quals --path_to_bowtie ~/bin/bowtie-0.12.7/ -n 1 -l 20 ~/sequence/test/ test_reads_bs.fq
      paired-end:
      ~/bin/bismark_v0.2.4/bismark -q --phred64-quals --path_to_bowtie ~/bin/bowtie-0.12.7/ -n 1 -l 20 -I 60 -X 350 ~/sequence/test/ -1 test_reads_bs1.fq -2 test_reads_bs2.fq
    • Running the methylation extractor
      single-end:
      ~/bin/bismark_v0.2.4/methylation_extractor -s --comprehensive --report test_reads_bs.fq_bismark.txt
      paired-end:
      ~/bin/bismark_v0.2.4/methylation_extractor -q --comprehensive --report test_reads_bs.fq_bismark.txt