Issues in assembly of a gene catalog (phone discussion w/ Volker Brendel of PlantGDB)
Discussion of publication ideas (see below)
Notes
Ideas about visualization
Impressed with the Kubach tree visualization tool for the big species and big gene trees. Especially liked the semantic zooming of clade detail, which we felt was a better way to collapse detail than a hyberbolic tree.
Would like to make sure the DE has three tree panels simultaneously displayed: A - the species tree, B - the gene tree "ATV-style" (with labeled nodes) and C - a very highly magnified "fat tree" (for one or at most a few branches at a time. The "fat tree" was deemed useful for those interested in hybridization, species tree questions, and useful pedagogically - it should be readily available if not always present, and not an 'expert option'. Additional rqmnts:
The region shown in C would have a corresponding panning box in B.
Selecting regions on the gene tree would light up corresponding regions on the species tree and vice versa.
The WG liked the idea of highlighting orthology groups dynamically with the mouse-over, as in the Princeton Orthology Database (http://ppod.princeton.edu/), but that there may be better visual strategies to indicate which genes are in an orthology group than used there.
Colorization of labels by e.g. species is nice, but likely to be problematic unless there are a limited number of groups, e.g. all the genes in one organismal clade relative to all the others.
It would be desirable to make sure the user can easily see how many/where duplication events or other incongruencies separate two genes. Not sure how to accomplish that, though.
Ideas for publications
Tests of the accuracy and scalability of different algorithms with different biological models.
Accuracy and timing as a function of size, as part of a short overview of the scale challenge - Bengt & Todd
Accuracy as a function of model violations - on tree sizes small enough to use Prime-GSR - Bengt & Cecile
See "Planning of simulation study" above for details
Accuracy using Bowers benchmark dataset, with a more biologically focused paper - Jamie & Jim
Jim is interested in an additional paper on the effect of heterogeneity in background duplication rates and polyploidy, if time allows. This would naturally follow on from b and c.
Confidence measures on phylogenies when using non-probabilistic objective functions? [Not sure this stands on its own as currently conceived] - Cecile & Todd
iPlant discovery environment tree reconciliation gene catalog, pipeline, and interface
Rolled into 1KP pilot data, presenting analysis as an enhanced publication - Jamie & Jim
Possibly a separate application note, with more focus on the interface - Todd, Sheldon, Andrew
Review of the state of the art in gene tree reconciliation, with a focus on combinations of different processes happening simultaneously. [Combine with 1a?] - All, Todd as lead