BWA_index_mem-0.7.10

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to xwang@cshl.edu. Thank you.

Rationale and background:

BWA: Fast and accurate short read alignment with Burrows-Wheeler Transform

Li H. and Durbin R.

Bioinformatics 2009; 25:1754-60. [PMID: 19451168]

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It first needs to construct the FM-index for the reference genome (the index command) and then invoked with different sub-commands for alignment algorithms, BWA-backtrack, BWA-SW, and BWA-MEM. BWA-MEM is the latest algorithm and generally recommended for high-quality queries as it is faster and more accurate. The algorithm supports both single (SR) and paired-end (PE) reads and performs chimeric alignment. It is applicable to a wide range of query sequences, 70bp-1Mbp, and has better performance than BWA-backtrack for 70-100bp Illumina reads.

This AGAVE/DE app wraps bwa-index and bwa-mem modules of BWA for ChIP-Seq workflow but not limited to. It takes fastq files as inputs and produces alignments in SAM/BAM format.

BWA software (http://bio-bwa.sourceforge.net)

Pre-Requisites

A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Mandatory arguments
1. Sequences folder for protein of interest (Note: the files could be in FASTA or FASTQ format but should be named including reads end information for PE reads, e.g., test_R1.fq and test_R2.fq)
2. Sequences folder for background control (Same as b)
3. Reference genome sequence in FASTA format
4. Read type: SR vs PE
Optional arguments:

Minimum score: Don’t output alignments with score lower than INT
Type of sequencing reads: Illumina, PacBio, Oxford Nanopore, Intra-species contains to ref
Sort method for BAM: Sort alignments by leftmost coordinates, or by read name
Mark shorter split: Mark shorter split hits as secondary (for Picard compatibility)
Sam output: keep or purge the alignments in SAM

Sample data

The following test data are provided for testing BWA-index-mem here /iplant/home/xiaofei_iplant/Sorghum_chr8/chr8_test:

G3_P_K4me3_chr8
G3_P_K4me3_rep1_chr8_R1.fq and G3_P_K4me3_rep1_chr8_R1.fq
G3_P_K4me3_rep2_chr8_R1.fq and G3_P_K4me3_rep2_chr8_R1.fq
G3_P_H3_chr8
G3_P_H3_rep1_chr8_R1.fq and G3_P_H3_rep1_chr8_R2.fq
G3_P_H3_rep2_chr8_R1.fq and G3_P_H3_rep2_chr8_R2.fq
Sorbi1.31.chr8.reNm.fa

Results

Successful execution of the BWA-index-mem assessment pipeline will create a directory named out for each sample. The directory will contain SAM/BAM files for both samples of protein of interest and background input, which can be further processed for downstream analysis and visualization.

Outputs

G3_P_H3_chr8_BWA_sam*
1. G3_P_H3_rep1_chr8_R.sam
2. G3_P_H3_rep2_chr8_R.sam
G3_P_H3_chr8_BWA_bam
1. G3_P_H3_rep1_chr8_R.sorted.bam
2. G3_P_H3_rep1_chr8_R.sorted.bam.bai
3. G3_P_H3_rep2_chr8_R.sorted.bam
4. G3_P_H3_rep2_chr8_R.sorted.bam.bai
G3_P_K4me3_chr8_BWA_sam*
1. G3_P_K4me3_rep1_chr8_R.sam
2. G3_P_K4me3_rep2_chr8_R.sam
G3_P_K4me3_chr8_BWA_bam
1. G3_P_K4me3_rep1_chr8_R.sorted.bam
2. G3_P_K4me3_rep1_chr8_R.sorted.bam.bai
3. G3_P_K4me3_rep2_chr8_R.sorted.bam
4. G3_P_K4me3_rep2_chr8_R.sorted.bam.bai

*SAM folders are optional to keep or not.