TR_problem_statements_sep09

Some "Tree Reconciliation" Problem Statements for the September 2009 Developers Meeting in Austin

All: Please add your own to these, modify the ones that are there, and leave comments as to priority, interest, and feasibility.  Thanks, Todd

1. What are the orthologs in the selected taxa for a focal gene?

A biologist would like to undertake an experiment that involves manipulation of related genes in different plant species.  They have certain taxa in mind. Given those taxa, but taking into consideration the rest of the gene tree and species tree reconciliation, which genes in the other taxa are orthologs (ie there is a speciation event at their divergence node) as opposed to paralogs (ie there is a duplication event at the divergence node) relative to the focal gene?  They may want to specify the focal gene using an identifier in a sequence database (e.g. Genbank), an identifier in a model organism database (e.g. an AT# from Arabidopsis), or an EST sequence of their own.  They would be interesting in knowing what the level of uncertainty is in the conclusions (due to uncertainty in tree topology and rooting) and visualizing the gene tree within the species tree of selected taxa (PrimeTV style).

1B. Where in evolution did protein domain structural changes occur within a gene family?

A related problem is to identify where changes in protein domain structure (eg INTERPRO domain presence/absence/number/order) occurred on a species tree for the same set of orthologs and paralogs.

2. What are the gene families that expanded or contracted in number along a particular phylogenetic branch or path?

A biologist is analyzing the gene content in a newly sequenced plant and would like to know what gene families underwent duplication or loss along the phylogenetic path connecting the newly sequenced plant with its most closely related sequenced relatives.  They would like to see the results ranked according to the proportion of duplicates/losses per copy by gene family, or by Gene Ontology slim terms.  They would like to have confidence intervals on the proportions that take into account gene family and species tree uncertainty.  It would be nice to be able to explore the results under particular alternative species phylogenies.

3. What are the gene families that show similar patterns of expansion or contraction?

A biologist would like to identify gene families of unknown gene function that are coevolving with gene families of known function, based on correlation in the number of duplications (and possibly losses) on branches of the species phylogeny.  Given one gene family, what other gene families are "close" to it in this reconciliation space, and what are the annotations (eg GO assignments) for each family?

4. Is phylogenetic incongruity in a given gene family due to processes other than duplication?

For genes in closely related species with phylogenies that are incongruent with those of the species, what is the relative support for the hypotheses of duplication/loss versus lineage sorting/hybridization?  For distantly related genes, what is the relative support for lateral transfer versus duplication/loss?