AntEpiSeeker 2.0

AntEpiSeeker 2.0

Epistatic interactions of multiple single nucleotide polymorphisms (SNPs) are now believed to affect individual susceptibility to common diseases. The detection of such interactions, however, is a challenging task in large scale association studies. The ant colony optimization algorithm (ACO) has been shown to be useful in detecting epistatic interactions. AntEpiSeeker, a new two-stage ant colony optimization algorithm, is a powerful and efficient tool for detecting epistasis in a case-control design.

Citation: AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res Notes 2010, 3: 117. ?[From the Web site.]

Usage

The application is available in the Discovery Environment and can be run from the command line. All of the command-line options are supported in the Discovery Environment.

The application itself takes no parameters but uses "parameters.txt" as a parameter file. A Python wrapper is available to build the parameter file from a command line prompt.

./antepiseeker.py <options>

Limitations

  1. The phenotypes must be binary traits, e.g. cancer or health.
  2. The genotypes MUST be heterozygous with only two heritability traits.

Example Data

Test data can be found at:

Community Data / iplantcollaborative / example_data / antepiseeker / input

Results using the parameter file including with the relevant input files can be found at:

Community Data / iplantcollaborative / example_data / antepiseeker / output

Application Options

Input File(s)

Either one of the Genotype File or the Pathway2SNP File or BOTH need to be inputted into the application. If you have one of the standard PLINK formats, (.map/.ped, .bed/.bim/.fam), then use PLINK Conversion to convert to transposed PLINK file format (.tfam/.tped).  Then use PLINK 2 AntEpiSeeker Conversion to convert transposed PLINK file format (.tfam/.tped) to the proper AntEpiSeeker input format for the Genotype File.  If you have an alternative file format that is not PLINK, then use Tassel3 Conversion or Tassel4 Conversion to attain the proper PLINK file format, then follow the above directions.

Genotype File

The user should create a tab-delimited file which contains the case-control genotype data as an input for the program. The first row of this input file contains the sample status (0 or 1). The following rows are the genotype data which should be coded by 0, 1 and 2 with each row corresponding to one SNP. For example:

class 1 1 1 0 0
rs1   0 1 1 2 2
rs2   1 2 0 1 2
rs3   2 2 1 2 1
rs4   2 2 1 1 2

Pathway2SNP file

The user should also make a tab-delimited file which contains information of pathway-SNP associations. Each pathway should be placed in one row, with the first column specifying the pathway ID and the following columns containing its associated SNPs. For example:

pw1 rs1 rs2 rs3
pw2 rs4 rs5
pw3 rs6 rs7 rs8 rs9

Parameters

--iAntCount=<int>

Number of ants.

--iItCountHsize=<int>

Number of iterations for each size of SNP sets

--alpha=<float>

Weight given to pheomone deposited by ants.

--rou=<float>

Evaporation rate in Ant Colony Optimization.

--phe=<float>

Initial pheromone level for each locus.

--iTopModel=<int>

Number of top-ranking SNP sets in the first stage.

--iTopLoci=<int>

Number of loci with top-ranking pheromone in the first stage.

--iEpiModel=<int>

Number of SNPs in an epistatic interaction

--largesetsize=<float>

Size of the large SNP set

--smallsetsize=<float>

Size of the small SNP set

--pvalue=<float>

P value threshhold (after Bonferroni correction)

--pwprop=<float>

Proportion of SNPs within a pathway used for computing the pathway pheromone level

--weighted=<float>

Whether to adjust the contributions of overlapping SNPs for the number of involved pathways? 1; adjust; 0; not

--INPFILE=<filename>

Input file for case-control genotype data.

--PWSNPFL=<filename>

Input file for mapping pathways and SNPs

--OUTFILE=<filename>

Output filename for detected epistatic interactions.

--PWYFILE=<filename>

Output filename for showing sorted pathways

Output File(s)

The program generates a few output files.  Per the documentation , a log file includes the intermediate results including the detected top-ranked haplotypes and the loci with top pheromone levels. The result file includes the detected epistatic interactions with significant user-defined p value threshold (after the Bonferroni correction).

Tool Source