/
AXPARAFIT

AXPARAFIT

Prepared by: Liya Wang

Tool Name: AxParafit

Homepage: http://icwww.epfl.ch/~stamatak/AxParafit.html

Platforms: Mac OS X, Linux, Windows

Implementation Language: C

Mailing List: No

Documentation/Manual: http://icwww.epfl.ch/%7Estamatak/index-Dateien/countManualAx.php

Overview: AxParafit (AleXandros's version of Parafit), a parallel version of the program ParaFit for fitting host and parasite trees. AxParafit and AxPcoords are highly optimized versions of Pierre Legendre's ParaFit and DistPCoA programs for statistical analysis of host-parasite coevolution. AxParafit has also been parallelized with MPI (Message Passing Interface) for compute clusters. The AxParafit site also includes a parallel version of the program AxPcoords which is used with AxParafit.

Literature:
Stamatakis, A., A. Auch, J. Meier-Kolthoff, and M. Göker. 2007. AxPcoords and Parallel AxParafit: Statistical co-phylogenetic analyses on thousands of taxa. BMC Bioinformatics 8: 405.

Related Tools: parafit
Input:

  • File A, B and C, same as parafit
  • Specify number of rows, columns of A, B and C
  • Specify BLAS (Basic Linear Algebra Package) via AxParafitBLAS
  • Pass tracefile for parallel computation via the -t switch

Run:

  • Global and individual test: AxParafitBLAS -p 10 -n1 129 -n2 430 -n3 42 -n4 136 -A smallA -B smallB -C smallC -n TEST
  • Global test only: AxParafitBLAS -g -p 10 -n1 129 -n2 430 -n3 42 -n4 136 -A smallA -B smallB -C smallC -n GLOBAL
  • Parallel: mpurun_rsh -np 4 -hostfile hostfile AxParafitBLAS -p 10 -n1 129 -n2 430 -n3 42 -n4 136 -A smallA -B smallB -C smallC -t tracefile.GLOBAL -n PARALLEL

Math:

  • D = CA'B
  • ParaFitGlobal = trace(D'D) = sum(d_i_j^2)
  • TraceMax = max(sum of squared eigenvalues of B, sum of squared eigenvalues of C)
  • ParaFitGlobal = trace(D'D)
  • ParaFitLink1 = trace-trace(k)
  • ParaFitLink2 = trace-trace(k) / TraceMax-trace)

Output:

  • The output file (“outfile.TEST”) contains the similar information as that of parafit
  • Another output file ("tracefile.GLOBAL") is a binary file used for parallel computation

Improvements over parafit:

  • Reimplemented in C from Fortran: reducing unnecessary memory allocation for matrices
  • Mannually tuned compute-intensive for-loops
  • Integrated BLAS for dense matrix-matrix multiplication; and LAPACK for computation of eigenvectors/eigenvalues (total over 90% of parafit execution time)
  • Increased numerical stability via LAPACK than via DistPCoA (with parafit)
  • Parallelized statistical test of individual associations (independent) with MPI via a master-worker scheme

Discussion:

  • U-based (un-weighted/uniform; all branch lengths set to 1) analyses are in general more sensitive to the number of permutations
  • Increasing number of permutations is not helping in reducing the number of different significant links between U- and W-based analyses

Example of output:

Permutations: 9 N1 139 N2 430, N3 421 N4 136
Sum of squared PCoA eigenvalues of B = 9739475.88043

Sum of squared PCoA eigenvalues of C = 316125752.28646

TraceTot = 316125752.28646

Global test of cospeciation: ParaFitGlobal = 141710945.81848 Prob = 0.10000

Test of individual host-parasite links:

F1 = ParaFitLink1 F2 = ParaFitLink2

Parasite 1 Host 336 F1 = 716557.30539 Prob1 = 0.10000 F2 = 0.00411 Prob2 = 0.10000
Parasite 1 Host 337 F1 = 716760.94339 Prob1 = 0.10000 F2 = 0.00411 Prob2 = 0.10000
Parasite 1 Host 338 F1 = 649660.46703 Prob1 = 0.10000 F2 = 0.00372 Prob2 = 0.10000
Parasite 1 Host 349 F1 = 650444.72648 Prob1 = 0.10000 F2 = 0.00373 Prob2 = 0.10000
Parasite 1 Host 350 F1 = 650444.73681 Prob1 = 0.10000 F2 = 0.00373 Prob2 = 0.10000
Parasite 1 Host 352 F1 = 650444.73539 Prob1 = 0.10000 F2 = 0.00373 Prob2 = 0.10000
Parasite 1 Host 354 F1 = 650444.74820 Prob1 = 0.10000 F2 = 0.00373 Prob2 = 0.10000
Parasite 1 Host 372 F1 = 648548.38671 Prob1 = 0.10000 F2 = 0.00372 Prob2 = 0.10000
Parasite 2 Host 370 F1 = 645883.63175 Prob1 = 0.10000 F2 = 0.00370 Prob2 = 0.10000
Parasite 3 Host 263 F1 = 551284.30655 Prob1 = 0.10000 F2 = 0.00316 Prob2 = 0.10000