Assembling the Tree of Life for the Plant Sciences (iPToL)

New Progress report last updated Jan 24, 2012

Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning. Although our understanding of the phylogeny of the half million known species of green plants has expanded dramatically over the past two decades, the task of assembling a comprehensive "tree of life" for them presents a Grand Challenge. Its solution will require a significant intellectual investment at the developing intersection between phylogenetic biology and the computer sciences. We have brought together plant biologists and computer scientists to build the cyberinfrastructure needed to scale up phylogenetic methods by 100-fold or more, to enable the dissemination of data associated with such large trees, and to implement scalable "post-tree" analysis tools to foster integration of the plant tree of life with the rest of the botanical sciences. The undertaking to unravel the evolutionary relationships among all living things, and to express this in the form of a phylogenetic tree of life, is one of the most profound scientific challenges ever undertaken, and represents a true "moonshot" for the life sciences. We anticipate that early success in addressing the plant phylogeny problem will be especially useful in connection with other Grand Challenge Projects supported through the iPlant Collaborative that involve comparisons between genes, genomes, or species, insuring a broad impact of the project as a whole. Finally, the plant tree of life provides exciting opportunities for training and outreach at all levels. Since Darwin, the tree of life has proven to be a very accessible visual metaphor for nonscientists, providing an elegant opening for communicating results in the plant sciences and evolutionary biology to people with diverse backgrounds.
Participants in the large organizational workshop that led to our white paper agreed on the fundamental importance of four main goals. They are, in increasing order of approximate challenge:

  1. Developing scalable cyberinfrastructure for the analysis of two specific problems of widespread interest that require a phylogeny: inferring ancestral traits (and trait changes) on a tree, and inferring the history of gene duplication and loss in gene families
  2. Developing database integration cyberinfrastructure to enable relatively seamless import of trees from existing databases of trees (e.g., TreeBASE, Pfam)
  3. Developing high performance computing tools to permit tree reconstruction 1-2 orders of magnitude larger than is currently practical, and assembly of data sets for plants to take advantage of this new cyberinfrastructure.
  4. Developing tree visualization tools that scale well to large trees and communicate evolutionary relationships and annotations of trees effectively for disparate end users.

Collaborative implementation is organized into working groups with focused development goals. Each group has an iPToL superuser or faculty member designated as the lead and point of contact. The four main working groups are: Big Trees, Data Assembly, Tree Reconciliation, and Trait Evolution . Three crosscutting working groups to develop shared data and compute infrastructure are BIEN, Data Integration, merged with Data Assembly and Tree Visualization.

Education, Outreach and Training

The iPlant Collaborative offers opportunities for novel approaches to education, outreach, and training at multiple levels, from K12 to the citizen naturalist to the scientifically literate layperson to the fledgling scientist in training. We envision creative ways to use cyberinfrastructure (CI) to teach about plant biology and new opportunities to train teachers and students in the use of CI. We propose cross-training in biology and computer science for students of all ages and teacher workshops for training in the use and implementation of CI for teaching plant biology. Our basic, general goals for K12 education and public outreach are:

  1. To develop CI for application to K12 education and provide training to teachers to integrate the resulting tools into curricula.
  2. Facilitate access to effective educational materials for a broad public audience (e.g., through websites, YouTube, and new CI developed through this project).
  3. Facilitate access to journals, data, and other information for students and post-docs. We propose to meet these goals through collaboration with personnel from iPlant and from other Grand Challenge projects


Tree of Life Community Leaders

Main contacts

Pamela Soltis, Florida Museum of Natural History. University of Florida. Research and outreach interests: Angiosperm phylogeny, polyploidy (both ancient and recent), and the origin and evolution of the flower; student mentoring and public outreach through teacher education and museum exhibits and programs.
Douglas Soltis, Department of Botany, University of Florida. Research and outreach interests: Angiosperm phylogeny, genetic and genomic consequences of genome doubling (both ancient and recent), phylogeography, conservation genetics, and the origin and subsequent diversification of the flower.

Plant Science Community Leaders

Michael Donoghue, Department of Ecology and Evolutionary Biology, Yale University. Research interests: Diversity and evolution of flowering plants, using phylogenetic trees to understand patterns of diversification, character evolution, biogeography and ecology. As Director of Yale's Peabody Museum of Natural History he was directly involved in K-12 and family education and outreach activities, including the production of a museum exhibition entitled "Travels in the Great Tree of Life".

Computational Science Community Leaders

Val Tannen, Department of Computer and Information Science, University of Pennsylvania. Research and outreach interests: Databases and bioinformatics; systems for data integration and sharing between collaborating scientists, on data provenance, on phylogenetic data modeling and on the integration of AToL data resources.
Alexandros Stamatakis, Department of Computer Science, Technische Universität München Research and outreach interests: design of algorithmic and HPC solutions for large-scale phylogenetic inference; fostering communication and collaboration between computer scientists and biologists.
Todd Vision, Department of Biology, University of North Carolina. Research and outreach interests: Computational genomics and genome evolution. He teaches courses in computational and evolutionary genetics at UNC Chapel Hill, and has been an organizer since 2007 of the Phyloinformatics Summer Courses and Phyloinformatics Summer of Code through the National Evolutionary Synthesis Center. Vision has worked with the Destiny Science Bus program to bring inquiry-driven bioinformatics, plant biology and evolutionary biology educational opportunities to underserved secondary students in North Carolina.

iPToL Engagement Team

Naim Matasci, Scientific Lead, University of Arizona
Jerry Lu, Engagement Team Analyst, Cold Spring Harbor Laboratory
Liya Wang, Developer, Cold Spring Harbor Laboratory
Kurt Michels,Statistical Analyst, University of Arizona