Reconciled Trees in XML

Requirements

The requirements for the xml format include:

  1. Map the nodes of the guest tree onto the nodes and edges of the host tree.
    1. Node in gene tree maps to node in the species tree (speciation)
    2. Node in gene tree maps between two nodes in the species tree (duplication)
    3. Node in gene tree is ancestral to LCA node in the species tree (ancestral duplication)
  2. Tag the nodes in the guest tree by type of bifurcation (or multifurcation) event
    1. duplication event
      1. whole genome duplication
    2. speciation event
    3. horizontal transfer event
    4. deep coalescent event
  3. OTU names assigned to nodes in the host tree as well as nodes in the guest tree
    Both leaf nodes and internal nodes may need to be tagged with taxonomic names.
    1. Example. Gene families in the Bower's dataset
      1. Host Tree OTUs
        1. Poplar, Grape, Arabidopsis
      2. Guest Tree OTUs
        1. Gene family clades, with locus accession names
    2. Example. Transposable elements in corn.
      1. Host Tree OTUs
        1. Zea diploperennis, Zea luxurians, Zea mays, Zea nicaraguensis, Zea perennis
      2. Guest Tree OTUs (Transposable Elements/LTR Retrotransposonss
        1. Huck, Ji, Opie,
      3. The guest OTU names may appear in all species in the host tree. 
        1. For example Huck will be a named clade in the LTR retrotransposon tree with named leaf nodes in all of the species of Zea.
  4. Element tag terms follow an ontology
  5. Represent reconciliations for multiple guest trees for a single Host tree
    1. Example. All the gene trees for a Poplar,Grape and Arabidopsis

Existing XML formats for representing trees.

The following formats are existing xml formats that could be extended to for represent reconciled trees.

Phyloxml

Project Page: http://www.phyloxml.org/

Schema:

Overview:

  • Basic species trees can be described with clade, name, distance, and possibly confidence elements
  • Examples of some elements for more specialized analyzes: taxonomic information with scientific name, common name, authority, synonyms, rank, and taxonomy code; sequence data with gene name, sequence accession, and annotation; distribution; date; events such as duplications and speciations; basic control of tree appearance with colors and branch widths
  • Property elements allow the addition of domain specific textual and numerical data
  • Capability to describe relations between nodes (e.g. to describe nodes with more than one parent), and between sequences (e.g. to express orthology relationships)

Publications:

Perl Support:

Java Support:

NEXML

Project Page: http://www.nexml.org/

Schema Documentation:

Overview:

  • In the first place, we're designing an XML schema. This schema (designated as namespace http://www.nexml.org/2009) is documented on our wiki; the bleeding edge version is available from svn; the source code can be browsed on our site (it's a check out from our repository which is updated every five minutes); for bug reports and feature requests please visit our issue tracker page.
  • Secondly, we're implementing NeXML read/write abilities in a number of software packages.
  • Third, we're crossreferencing the NeXML schema with the Character Data Analysis Ontologywhich is being developed by other members of the EvoInfo working group.

Publications:

Perl Support:

Java Support:

Javascript

NEXML Questions

  1. Can NeXML tag any node as OTUs or is this restricted to leaf nodes? For example, we may want to tag a Rosids clade within a species tree or a Huck clade within a transposable element guest tree. 
    1. Answer from Rutger Vos on 6/13/11 - "The answer is yes. There is no constraint in the standard on whether otu references are attached to internal or terminal nodes. It is possible that not all processing libraries actually implement this correctly, though. If they don't it's a bug that needs to be corrected."
  2. Can NeXML be extended to officially support reconciliations?
    1. Response from Rutger Vos (June 13th, 2011):
I don't see a "reconciliations" block making it into the standard any time soon. Rather, I would suggest something like

<tree id="host">

<node id="n1">
<meta property="gsoc:hostID" href="http://example.org/host1"/>
</node>
<!-- etc. -->
</tree>

<tree id="guest">
<node id="n2">
<meta property="gsoc:isGuestOf" href="http://example.org/host1"/>
</tree>
</tree>

If you want to implement multiple reconciliations you would use nested meta annotations so that you can indicate within which (of multiple
 uniquely identifiable) reconciliation node n2 is a guest of n1. Likewise if you want to attach additional semantics to any particular
 reconciliation (e.g. speciations in guests tracking hosts).

Obviously this requires a bit more thinking, but host/guest reconciliation to me seems like the kind of metadata that is not very
 likely to make it into the standard and be implemented in the various libraries (i.e. also in python, ruby in addition to the java, perl and
 javascript you already mention) so I strongly suggest you try to work out a way to do this with annotations.

Comparing XML Formats

Notes for comparing formats by existing requirements discussed above.

Requirement

phyloXML

NEXML

1. Host to guest node mapping

supported?

supported?

2. Tag guest node bifurcation

supported?

supported?

3. OTU node tagging

supported?

supported?

4. Tag terms use ontology

supported?

supported?

NeXML Implementation

The reconciliations meta properties that map guest and host trees lives in the top level Trees element in the NeXML document. Within guest nodes, meta properties map to host nodes and edges.

Maping guest and host trees

  • "trees" element
    • "tron:reconciliations" meta property  (ResourceMeta)
      • "tron:reconciliation" meta property (ResourceMeta)
        • "tron:reconciliation_id" meta property (string)
        • "tron:host_tree_id" meta property (string)
        • "tron:guest_tree_id" meta property (string)
        • "tron:reconciliation_method" meta property (ResourceMeta)
        • "tron:reconciliation_software" meta property (string)

Example of reconciliations in top level trees element

   

<trees otus="tax1" id="Trees">

        <meta property="tron:reconciliations" xsi:type="nexResourceMeta" id="recs" about="recs">

            <meta property="tron:reconciliation" xsi:type="nex:ResourceMeta" id="rec1" about="rec1">

                <meta property="tron:reconciliation_id" datatype="xsd:string" content="rec1"/>

                <meta property="tron:host_tree_id" datatype="xsd:string" content="host_tree1"/>

                <meta property="tron:guest_tree_id" datatype="xsd:string" content="guest_tree1"/>

                <meta property="tron:reconciliation_method" xsi:type="nex:ResourceMeta" id="meta3" about="meta3">

                    <meta property="tron:reconciliation_software" datatype="xsd:string" content="TREEBEST"/>

                </meta>

            </meta>

            <meta property="tron:reconciliation" xsi:type="nex:ResourceMeta" id="rec2" about="rec2">

                <meta property="tron:reconciliation_id" datatype="xsd:string" content="rec2"/>

                <meta property="tron:host_tree_id" datatype="xsd:string" content="host_tree1"/>

                <meta property="tron:guest_tree_id" datatype="xsd:string" content="guest_tree1"/>

                <meta property="tron:reconciliation_method" xsi:type="nex:ResourceMeta" id="meta3" about="meta3">

                    <meta property="tron:reconciliation_software" datatype="xsd:string" content="PRIMEGSR"/>

                </meta>

            </meta>

        </meta>

Mappings of guest nodes to host nodes and edges

  •  "node" element
    • "tron:reconciliation_node_id" meta property (string)
    • "tron:host_node_parent" meta property (string)
    • "tron:host_node_child" meta property (string)
    • "tron:guest_node_type" meta property (string)

Example of a guest node mapping


  <node id ="gn4" otu="g1">

                <meta property="tron:reconciliation_node_id" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="reconciled_node4"/>

                <meta property="tron:host_node_parent" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="hn3"/>

                <meta property="tron:host_node_child" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="hn3"/>

                <meta property="tron:guest_node_type" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="leaf_node"/>

            </node>

Example Trees to Represent in XML (deprecated)

Example 1

An example tree to represent in XML .. this can be edited within the wiki using the GUI markup tool. 

This represents a cases where there is a single duplication on the branch leading up to the Arabidopsis leaf node.

Unknown macro: {mockup}

Example 2

Gene duplication before the LCA of the species tree. 

Unknown macro: {mockup}

Examples of Extending NEXML (deprecated)

The following workspace can be used to draft some examples.

NEXML Example 1

At work description of the reconciled tree indicated above ... using h in tags below to represent the host attributes and g to represent the guest attributes. For example he1 is host edge 1 and hn1 is host node 1 while he1 is host edge1 and hn1 is host node 1.
For information on proposed and existing tags see https://www.nescent.org/wg_evoinfo/Future_Data_Exchange_Standard#Element_description

<otus>
  <otu id="h1" label="Arabidopsis"/>
  <otu id="h2" label="Poplar"/>
  <otu id="h3" label="Grape"/>  
  <otu id="g1" label="GeneAt00056"/>
  <otu id="g2" label="GeneAt01337"/>  
  <otu id="g3" label="GenePt00711"/>  
  <otu id="g4" label="GeneVv00142"/>
</otus>

<characters>
<!-- Sequence data for each guest OTU -->
</characters>

<trees>
 <!-- HOST TREE -->
  <tree id="host_tree1" label="species tree">
   <!-- NODES -->
    <node id="hn1" otu="h1"/>
    <node id="hn2"/>
    <node id="hn3" otu="h2"/>
    <node id="hn4"/>
    <node id="hn5" otu="h3"/>
   <!-- EDGES -->
    <edge id="he1" source="hn1" target="hn2"/>
    <edge id="he2" source="hn1" target="hn5"/>
    <edge id="he3" source="hn2" target="hn3"/>
    <edge id="he4" source="hn2" target="hn4"/>
  </tree>
 <!-- GUEST TREES CAN FOLLOW-->
  <tree id="guest_tree1" label="Monkeynaut Genes">
   <!-- GUEST TREE NODES -->
    <node id ="gn1"/>
    <node id ="gn2"/>  
    <node id ="gn3"/>  
    <node id ="gn4" otu="g1"/>
    <node id ="gn5" otu="g2"/>
    <node id ="gn6" otu="g3"/>
    <node id ="gn7" otu="g4"/>
   <!-- GUEST TREE EDGES -->
    <edge id ="ge1" source="gn1" target="gn2"/>
    <edge id ="ge2" source="gn1" target="gn7"/>
    <edge id ="ge3" source="gn2" target="gn3"/>
    <edge id ="ge4" source="gn2" target="gn6"/>
    <edge id ="ge5" source="gn3" target="gn4"/>
    <edge id ="ge6" source="gn3" target="gn5"/>
  </tree>


  <!-- THE FOLLOWING IS A BIG ADDITION TO NEXML BUT WOULD
       ALLOW FOR MULTIPLE RECONCILIATIONS WITHIN A SINGLE
       FILE AS WELL AS ALLOW FOR MULTIPLE RECONCILIATIONS
       BETWEEN A HOST TREE TOPOLOGY AND A GUEST TREE TOPOLOGY.
       THIS COULD BE OF A GENERAL CLASS OF CROSS-NETWORK-MAP
       MAPPINGS OF ONE NETWORK ONTO ANOTHER.
       THE PARADIGM HERE IS THAT WE ARE MAPPING THE NODES
       OF THE GUEST TREE ONTO NODES AND EDGES OF THE
       HOST TREE.
       FOR EDGE MAPPINGS, MAP BETWEEN TWO NODES ON HOST
                          TREE TOPOLOGY
                          host_node_source != host_node_target
       FOR NODE MAPPINGS, MAP TO SINGLE NODE ON HOST TREE
                          host_node_source == host_node_target
  -->
  <reconciliations>
   <reconciliation id="rec1" label="Reconciled Monkeynaut Genes" 
                   host="host_tree1" guest="guest_tree1" 
                   method="method1">
    <reconciled_node id="reconciled_node1" guest_node="gn1" 
                     host_node_parent="hn1" host_node_child="hn1"
                     guest_node_type="speciation"/>
    <reconciled_node id="reconciled_node2" guest_node="gn2" 
                     host_node_parent"hn2" host_node_child="hn2"
                     guest_node_type="speciation"/>
    <reconciled_node id="reconciled_node3" guest_node="gn3"
                     host_node_parent="hn2" host_node_child="hn3"
                     guest_node_type="duplication"/>
    <reconciled_node id="reconciled_node4" guest_node="gn4"
                     host_node_parent="hn3" host_node_child="hn3"
                     guest_node_type="leaf_node"/>
    <reconciled_node id="reconciled_node5" guest_node="gn5"
                     host_node_parent="hn3" host_node_child="hn3"                     
                     guest_node_type="leaf_node"/>
    <reconciled_node id="reconciled_node6" guest_node="gn6"
                     host_node_parent="hn4" host_node_child="hn4"
                     guest_node_type="leaf_node"/>
    <reconciled_node id="reconciled_node7" guest_node="gn7"
                     host_node_parent="hn5" host_node_child="hn5"
                     guest_node_type="leaf_node"/>
   </reconciliation>
  </reconciliations>


  <methods>
   <method id="method1"/>
   <!-- MIAPA COMPLIENT METHODS DESCRIPTIONS HERE -->
  </methods>

</trees>

Here is the example as a NeXML file parsable by BioPerl and Bio::Phylo. xmllint complains only about the <reconiliations> element which is not defined in the nexml schema.