Big Trees
Big Trees
The tree reconstruction working group will focus on developing and scaling up computational methods for large-scale phylogenetic tree reconstruction, with a specific focus on the broadly-used Maximum Likelihood (ML) model. The goal of this effort will be to reconstruct a comprehensive tree of life for the green plants that will entail and describe the evolutionary relationships between approximately 500,000 organisms. This tree will then serve as a backbone for the iPToL infrastructure that will successively enrich the bare tree with biological meta-information and provide novel post-analysis methods to conveniently explore the information provided by the tree.
Current ML-based tree reconstruction programs can reconstruct trees from data sets with approximately 60,000 organisms and a few genes. Thus, in order to achieve our goals, we will need to improve scalability of the current methods by at least one order of magnitude. This can only be achieved by a combination of algorithmic progress and advances in parallel computing. In addition, we will require scalable consensus tree reconstruction methods in order to summarize the information contained in potential candidate trees. We propose to initially re-design the search algorithms in such a way that they can be more easily parallelized and scale on thousands of CPUs (which is currently not the case for the data sets we intend to analyze) and will also assess the usage of emerging parallel architectures, such as the Intel Larrabee, for solving the computational requirements. The tree reconstruction tool will require some form of non-determinism coupled with a mechanism to reduce the dimensionality of the trees (and hence the search space) and will also require the usage of multi-grain parallelism to increase parallel efficiency.
Apart from the specific tree reconstruction for building the plant tree of life, the development of such a tool, that will be made available as open-source code, will be of large benefit to other research communities that need to infer trees with hundreds of thousands of organisms. The close collaboration with biologists within the framework of iPToL will also allow for a rapid and early empirical assessment of the large trees and algorithmic prototypes which represents a great asset to both communities for accelerating progress.
Progress report
Working Group Members
Name |
Role |
Institution |
---|---|---|
Alexis Stamatakis |
RAxML Lead |
Heidelberg Institute of Theoretical Studies |
Fernando Izquierdo |
Postdoctoral Fellow |
Heidelberg Institute of Theoretical Studies |
Travis Wheeler |
Collaborator |
Howard Hughes Medical Institute, Janelia Farm |
Naim Matasci |
Scientific Lead |
iPlant Collaborative, University of Arizona |
John Cazes |
RaxML Developer |
iPlant Collaborative, Texas Advanced Computing Center |
Robert McLay |
WindJammer Developer |
iPlant Collaborative, Texas Advanced Computing Center |