Needs for Web Service to Support Multiple Reconciliations

Needs for Web Service to Support Multiple Reconciliations

Needs for Web Service to Support Multiple Reconciliations

Overview

The basic need:

Currently the standalone tree reconciliation browser provides an interface for users to access gene trees reconciled to their host species trees. Currently, this browser is limited to providing results from a single software source for mapping the gene trees onto the species trees. For example, we can show the results from a set of TREEBEST reconciliations for all gene trees mapped to a species tree. However the browser can not currently be used to query or visualize multiple reconciliation results for an individual gene tree. For example other results could be a 'canonical' result based on a synteny-informed reconstruction of gene duplication history, or the result from a different program such a Phyldog. Ideally the query within the scope of the same database connection to the same database.

The current browser represents a limitation since we may not know 'a-priori' do want the browser to be able to show the results from different programs since we do not currently know which program even would provide the 'best' reconstruction of gene family history.

The MySQL database that holds the tree reconciliation information DOES allow for multiple reconciliations to be stored, although the way these data are stored may need to be optimized for quick retrieval of reconciled tree sets. (Here reconciled tree set refers to a group of gene trees reconciled to a species tree using the same software with the same parameter values).

General goal:

Give users access to results from multiple reconciliation approaches (or even different parameter values within the same software package) when querying the TR standalone database.

General computational process:

Given a set of plant genes for multiple species that have been classified into gene families (or gene clusters) we want to:

  1. Run multiple reconciliation pipelines so that each gene family is represented by a result for each pipeline
    (This is currently supported for TREEBEST and PRiME-GSR in code that Sheldon wrote, Jamie is working on supporting Phyldog and we have an example of synteny-based reconstruction for a small set of species).

  2. Store the results for these reconciliation within a single TR database
    (This is currently supported in the current TRDB schema, but may need to be slightly modified to increase speed of queries to fetch entire sets at once). 

  3. Allow the TR viewer to access these multiple reconciliations within TR database

Components requiring updates

The components that will require updating to support this are:

  1. Database schema

    1. Minor changes to database schema to better support multiple reconciliations

  2. Database loading scripts

    1. Additional script to load reconciliation meta-data into the database

    2. Changes to the scripts that load reconciliation results

  3. Web-API

    1. Modifications to various components to become aware of reconciliation sets

    2. Possible additions to support new types of queries that are reconciliation set aware

      1. Example 1: Generate a table that summarizes gene family results for all the reconciliation processes used

  4. TR-Standalone Viewer

    1. Modifications to support the option to see multiple reconciliations 

Additional background information:

The best place to start is the wiki describing the TR architecture in general which is available at

https://pods.iplantcollaborative.org/wiki/display/iptol/1.0+Architecture

This includes the 

A set of powerpoint slides documenting the TR database is available at

http://www.slideshare.net/j_estill/james-estill-ievobiofinalpowerpoint-8416065

and a on the TR tools is attached to this wiki.

Existing Tools

Live demo version of the viewer:

The current demonstration of the Tree Reconciliation viewer is available at:

http://tr.iplantcollaborative.org/Tr_standalone.html

This connects to the database hosted at:

http://votan.iplantcollaborative.org

Code repositories:

The Perl code for the back end services to connect to the database and to populate a database are hosted on github at:

https://github.com/iPlantCollaborativeOpenSource/iplant-treerec

this java code for the interface is also on github at:

https://github.com/iPlantCollaborativeOpenSource/tr-standalone

a current MySQL database dump of a working version of the database is currently on svn at iPlant.

(need link to get this data here)

Database Documentation:

The database schema that holds the data is described at:

Some general examples of how this database is used for topological queries is at:

https://pods.iplantcollaborative.org/wiki/display/coresw/Copy+of+SQL+Queries+-+Tree+Topological+Queries

Big Tree Viewer documentation:

The current Tree Viewer is based on the the Big Tree Viewer which is documented at:

https://pods.iplantcollaborative.org/wiki/display/iptol/Big+Tree+Viewer+Documentation

Web-API Documentation:

Is currently documented within the code repository as PerlDocs. Specifically see Perl Modules at
https://github.com/iPlantCollaborativeOpenSource/iplant-treerec/tree/master/lib/IPlant/TreeRec

In general 

Changes Needed

Required Changes for TR Database Support of Multiple Reconciliations

The TR database schema does currently support the ability to store information for multiple reconciliations. These reconciliations are stored in the set of reconciliation tables described at:

https://pods.iplantcollaborative.org/wiki/display/iptol/Database+Schema#DatabaseSchema-ReconciliationTables

Attributes concerning an individual reconciliation of a single gene tree to a single species tree are stored in the table 'reconciliation_attribute'. These attributes can include details of how the reconciliation was performed, what software was used, the parameters used within the software and other details using a controlled vocabulary set of tables. These CV tables make use of a 'Tree Reconciliation Ontology' that Jamie has developed to support the annotation of reconciled trees in the database.