iPToL ET 01JUL09
Attendees: Sriram, Andy, Steve, Karla, Nirav, Damian, Dan, Jerry, Liya, Matt, Todd Vision, Adam, Phil, Sheldon, Dan
Action Items:
Agenda
- Tree Reconciliation Working Group (Todd)
- Q&A with Todd
- Review 24JUN09 Action Items (Sheldon)
- General Update (Sheldon)
Notes
Presentation by Todd Vision:
Two post-tree analysis priorities
• Trait evolution
• Tree reconciliation
Incongruence: when gene trees differ from species trees:
• Lineage sorting and hybridization Timescale:
• Gene duplication (and loss) young…to…
• Horizontal transfer (incl. endosymbiotic transfer) old
Gene duplication and loss
• Homologs: genes descended from a common ancestor
• Reconciliation allows you to distinguish two kinds of homologs
o Orthologs: diverged through speciation
o Paralogs: diverged through duplication, whether or not they are in the same genome
Gene tree reconciliation (GTR)
• Projection of a species tree onto a gene tree
• Inferring duplications (and optional losses)
o With incomplete genomes, loses are ambiguous
o Most frequent objective function is parsimony
o Probabilistic methods not yet fast enough for practical applications
• Lineage sorting, horizontal transfer
o Good recent algorithmic work, but a (mostly) separate literature
o The former mostly of interest
Some application of GTR
• What nodes are duplications (and what missing nodes represent losses)?
• Which sets of genes are orthologous?
• What was the complement of genes in a given ancestral species?
• What is the (rooted) species tree?
• Where are the phylogenetic positions of the ancient polyploidy events (and how many duplicates have survived)?
• Are gene families coevolving (and thus potentially interacting)?
Complications
• Polytomies (in either species or gene tree)
• Uncertainty in rooting (particularly the gene tree)
• Algorithm performance has not been thoroughly tested
o Speed on large trees
o Accuracy (particularly if incongruence is not due to one factor alone)
• Confidence measures are generally lacking
Bottleneck and Checkpoints
• Obtaining conservatively resolved rooted species trees, possibly with branch lengths
• Obtaining gene gress with confidences values, optionally rooted from online sources (or calculating tem…)
o Determining user needs for gene tree metadata (to enable search and retrieval)
o Enabling user upload
• Aligning taxonomic identifiers between the species and gene tress
• Determining algorithms for reconciliation
o Rotting
o Confidence values
o Objective function (dup only, dup+loss, lineage sorting, hybridization, horizontal transfer, etc)
o Speed and accuracy
o Deciding on extenet of user options
• Determining user needs for analysis results (orthology, ancestral gene content, domain evolution, etc)
• Formatting visualization, exchange of results
Applicable existing tools and software components
• NOTUNG (Dannie Durand’s group) – most full featured and well-maintained software
o Includes a version of Zmasek’s ATV viewer for visualization
• Zmasek&Eddy (2001) fast heuristic algorithm implemented in a few...
Expectations of the engagement team and iPlant developers
• Of iPlant as a whole
o Aggressive project management and cross-WG coordination
• Of the developers
o Expertise in technologies to be deployed
o Acquisition of requisite domain knowledge
o Design charettes and feedback cycles with external users
o Open and iterative development
• Mailing lists, a public website with syndicated news, etc.
• Weekly status checks and frequent opportunities for feedback
• Releasing software (incl. source code) early and often
• Of the WG scientists
o Service orientation
o Algorithm agnosticism
o An interest in rigorously benchmarking scalability and accuracy