TR110321

Agenda

  1. Follow up from previous meeting

    1. Update on pipeline (Sheldon)
    2. Update on open sourcing and documentation (Naim)
      1. GitHub Repository: https://github.com/iPlantCollaborativeOpenSource
      2. Documentation: https://pods.iplantcollaborative.org/wiki/display/iptol/1.0+Architecture
      3. License
      4. It would be nice if we could convert Perl 'comments' documentation to the standard POD documentation. This makes it easier to extract info with perldoc. Documentation can also be extracted to confluence style wiki docs with Pod::Simple::Wiki or HTML style with Pod::Simple::HTML. Dennis did an excellent job commenting his code, and it would be straightforward to convert this to POD style. See below for an example.
    1. Google Summer of Code (Jamie)
      1. NESCent has been accepted to the 2011 GSOC
  1. Project status update

    See: https://pods.iplantcollaborative.org/jira/browse/TR
    1. Closed & Fixed issues
    2. Open critical issues
    3. Issue classification and timeline
    4. New features

Example of converting commented code docs to POD.

Current code documentation is similar to the following:

    ##########################################################################
    # Usage      : $loader = IPlant::TreeRec::DatabaseTreeLoader->new($dbh);
    #
    # Purpose    : Initializes a new tree loader with the given database
    #              handle.
    #
    # Returns    : The new tree loader.
    #
    # Parameters : $dbh - the database handle.
    #
    # Throws     : No exception

could be written as something like

=head1 IPlant::TreeRec::DatabaseTreeLoader

=head2 Usage

$loader = IPlant::TreeRec::DatabaseTreeLoader->new($dbh);

=head2 Purpose

Initializes a new tree loader with the given database handle.

=head2 Returns

The new tree loader.

=head2 Parameters

$dbh - the database handle.

=head2 Throws

No exceptions

=cut

Thus typing

perldoc DatabaseTreeLoader.pm

yields something like:

IPlant::TreeRec::DatabaseTreeLoader

Usage

$loader = IPlant::TreeRec::DatabaseTreeLoader?>new($dbh);

Purpose

Initializes a new tree loader with the given database handle.

Returns

The new tree loader.

Parameters

$dbh ? the database handle.

Throws

No exceptions

Google Summer of Code

Example of what was on the PhyloSOC wiki (in Media Wiki Format)

; Rationale :

[[File:Tr demo 01.png|right|300px|thumb|Example of reconciled tree visualization]]Tree reconciliation uses an estimate of the species tree to infer the history of gene duplication and loss, lineage sorting, lateral transfer, and other events in a gene family's history ([http://www.ncbi.nlm.nih.gov/pubmed/9126565 Page 1997]). It thus has wide applicability in genomics and molecular biology, but has been used relatively infrequently, not because of lack of theory but of implemetations. Recently, substantial progress has been made on both algorithms and software development (Durand et al, 2006; Bensal et al. 2007 ), but important problems remain, including scaling implementations the size of the largest known gene families and species trees to be estimated, and handling uncertainty in the reconstruction of both gene and species trees.

A database schema and GUI interface has been developed for exploration, visualization and analysis of [https://pods.iplantcollaborative.org/wiki/display/iptol/Tree+Reconciliation gene tree reconciliation] as a component of the iPlant  project for  [https://pods.iplantcollaborative.org/wiki/display/iptol/Home assembling the tree of life for the plant sciences ]. An initial release of the [https://pods.iplantcollaborative.org/wiki/display/iptol/1.0+Architecture 1.0 version of this architecture] is being prepared [https://pods.iplantcollaborative.org/wiki/display/iptol/1.0+Repositories for release to public repositories] and will be completed by the start of the 2011 Summer of Code. This initial release is comprised of:

:* MySQL Database containing reconciliations for over 2500 gene families in six examplar species (poplar, grape, cucumber, papaya, soybean and Arabidopsis).

:* Pipeline in Perl for tree reconciliation using [http://treesoft.sourceforge.net/treebest.shtml TreeBeST], and Perl utilities for populating a database using these results

:* [http://en.wikipedia.org/wiki/Representational_State_Transfer Perl RESTful web service] for communication with the database

:* GUI with the ability to:

:** Search the collection of gene families by [http://www.geneontology.org/ Gene ontology] terms

:** Perform [http://en.wikipedia.org/wiki/BLAST BLAST] searches of query sequences against the database of reconciled genes

:** Visualize [http://www.geneontology.org/ GO] annotation of gene families as [http://en.wikipedia.org/wiki/Tag_cloud word clouds]

:** Visualize reconciled trees in an interface that supports interaction between gene trees and species trees

:** Retrieve and visualize speciation and gene duplication events as glyphs mapped onto trees

:** Provide summary statistics of the overall database and individual gene families

:** Provide access to alignments and sequence files for gene famlies

; Approach :

The following extensions to the current project would be worthwhile additions to the tree reconciliation code set and would provide valuable real-world experience to students interested in building highly scalable web applications:

:* Add tree topology comparison tools to the API - The current database supports multiple reconciliations for each gene tree. Tools to compare the topological differences among these computed reconciled trees would facilitate comparisons of program results

:* Add support for additional Tree Reconciliation results to the analysis pipeline - The current pipeline for populating the database uses [http://treesoft.sourceforge.net/treebest.shtml TreeBeST]. Additional programs to consider include [http://prime.sbc.su.se/ PrIME].

:* Add support for the storage of probabilistic model results to the tree topology stored in the database

:* Develop an appropriate ontology for reconciled tree terms - The existing schema uses ontologies to store terms associated with attributes of reconciled trees. Development of a standard ontology would facilitate storing rich metadata for reconciled tree reconstructions.

; Challenges :

:* A short term goal of this project is to support analysis for the [http://www.onekp.com/ 1kp project]. Visualization tools, database schema, queries and utilities must therefore scale to all genes and gene families for 1,000 species.

; Involved toolkits:

:* The use of [http://subversion.apache.org/ Subversion/SVN] and/or [http://git-scm.com/ GIT] version control will be required for collaborative development

:* Perl modules: [http://www.bioperl.org/wiki/HOWTO:Trees BioPerl TreeIO], [http://dbi.perl.org/ DBI], [http://search.cpan.org/~abraxxa/DBIx-Class-0.08127/lib/DBIx/Class.pm DBIx::Class],[http://search.cpan.org/~jeteve/Apache2-REST-0.06/lib/Apache2/REST/Handler.pm Apache2::REST::Handler]

:* Java: [https://github.com/akubach/phyloviewer phyloviewer]

:* MySQL

; Degree of difficulty and needed skills :

:* Medium difficulty

:* Knowledge of Perl and MySQL required for API development

:* Familiarity with the use of SQL for topological queries on [http://en.wikipedia.org/wiki/Nested_set_model nested set indexed trees] will be an asset

;Mentors :

:*''' [[User:Jestill|Jamie Estill]]''', [http://www.plantbio.uga.edu/~jleebensmack/JLMmain.html Jim Leebens-Mack], [[User:Tjvision|Todd Vision]]

<!-- === Extend Utilities and API for Tree Reconciliation Database===
; Rationale :
[[File:Tr demo 01.png|right|300px|thumb|Example of reconciled tree visualization]]Tree reconciliation uses an estimate of the species tree to infer the history of gene duplication and loss, lineage sorting, lateral transfer, and other events in a gene family's history ([http://www.ncbi.nlm.nih.gov/pubmed/9126565 Page 1997]). It thus has wide applicability in genomics and molecular biology, but has been used relatively infrequently, not because of lack of theory but of implemetations. Recently, substantial progress has been made on both algorithms and software development (Durand et al, 2006; Bensal et al. 2007 ), but important problems remain, including scaling implementations the size of the largest known gene families and species trees to be estimated, and handling uncertainty in the reconstruction of both gene and species trees.
A database schema and GUI interface has been developed for exploration, visualization and analysis of [https://pods.iplantcollaborative.org/wiki/display/iptol/Tree+Reconciliation gene tree reconciliation] as a component of the iPlant  project for  [https://pods.iplantcollaborative.org/wiki/display/iptol/Home assembling the tree of life for the plant sciences ]. An initial release of the [https://pods.iplantcollaborative.org/wiki/display/iptol/1.0+Architecture 1.0 version of this architecture] is being prepared [https://pods.iplantcollaborative.org/wiki/display/iptol/1.0+Repositories for release to public repositories] and will be completed by the start of the 2011 Summer of Code. This initial release is comprised of:
:* MySQL Database containing reconciliations for over 2500 gene families in six examplar species (poplar, grape, cucumber, papaya, soybean and Arabidopsis).
:* Pipeline in Perl for tree reconciliation using [http://treesoft.sourceforge.net/treebest.shtml TreeBeST], and Perl utilities for populating a database using these results
:* [http://en.wikipedia.org/wiki/Representational_State_Transfer Perl RESTful web service] for communication with the database
:* GUI with the ability to:
:** Search the collection of gene families by [http://www.geneontology.org/ Gene ontology] terms
:** Perform [http://en.wikipedia.org/wiki/BLAST BLAST] searches of query sequences against the database of reconciled genes
:** Visualize [http://www.geneontology.org/ GO] annotation of gene families as [http://en.wikipedia.org/wiki/Tag_cloud word clouds]
:** Visualize reconciled trees in an interface that supports interaction between gene trees and species trees
:** Retrieve and visualize speciation and gene duplication events as glyphs mapped onto trees
:** Provide summary statistics of the overall database and individual gene families
:** Provide access to alignments and sequence files for gene famlies
; Approach :
The following extensions to the current project would be worthwhile additions to the tree reconciliation code set and would provide valuable real-world experience to students interested in building highly scalable web applications:
:* Add tree topology comparison tools to the API - The current database supports multiple reconciliations for each gene tree. Tools to compare the topological differences among these computed reconciled trees would facilitate comparisons of program results
:* Add support for additional Tree Reconciliation results to the analysis pipeline - The current pipeline for populating the database uses [http://treesoft.sourceforge.net/treebest.shtml TreeBeST]. Additional programs to consider include [http://prime.sbc.su.se/ PrIME].
:* Add support for the storage of probabilistic model results to the tree topology stored in the database
:* Develop an appropriate ontology for reconciled tree terms - The existing schema uses ontologies to store terms associated with attributes of reconciled trees. Development of a standard ontology would facilitate storing rich metadata for reconciled tree reconstructions.
; Challenges :
:* A short term goal of this project is to support analysis for the [http://www.onekp.com/ 1kp project]. Visualization tools, database schema, queries and utilities must therefore scale to all genes and gene families for 1,000 species.
; Involved toolkits:
:* The use of [http://subversion.apache.org/ Subversion/SVN] and/or [http://git-scm.com/ GIT] version control will be required for collaborative development
:* Perl modules: [http://www.bioperl.org/wiki/HOWTO:Trees BioPerl TreeIO], [http://dbi.perl.org/ DBI], [http://search.cpan.org/~abraxxa/DBIx-Class-0.08127/lib/DBIx/Class.pm DBIx::Class],[http://search.cpan.org/~jeteve/Apache2-REST-0.06/lib/Apache2/REST/Handler.pm Apache2::REST::Handler]
:* Java: [https://github.com/akubach/phyloviewer phyloviewer]
:* MySQL
; Degree of difficulty and needed skills :
:* Medium difficulty
:* Knowledge of Perl and MySQL required for API development
:* Familiarity with the use of SQL for topological queries on [http://en.wikipedia.org/wiki/Nested_set_model nested set indexed trees] will be an asset
;Mentors :
:*''' [[User:Jestill|Jamie Estill]]''', [http://www.plantbio.uga.edu/~jleebensmack/JLMmain.html Jim Leebens-Mack], [[User:Tjvision|Todd Vision]]
-->