Reconciled Trees in XML
Requirements
The requirements for the xml format include:
- Map the nodes of the guest tree onto the nodes and edges of the host tree.
- Node in gene tree maps to node in the species tree (speciation)
- Node in gene tree maps between two nodes in the species tree (duplication)
- Node in gene tree is ancestral to LCA node in the species tree (ancestral duplication)
- Tag the nodes in the guest tree by type of bifurcation (or multifurcation) event
- duplication event
- whole genome duplication
- speciation event
- horizontal transfer event
- deep coalescent event
- duplication event
- OTU names assigned to nodes in the host tree as well as nodes in the guest tree
Both leaf nodes and internal nodes may need to be tagged with taxonomic names.- Example. Gene families in the Bower's dataset
- Host Tree OTUs
- Poplar, Grape, Arabidopsis
- Guest Tree OTUs
- Gene family clades, with locus accession names
- Host Tree OTUs
- Example. Transposable elements in corn.
- Host Tree OTUs
- Zea diploperennis, Zea luxurians, Zea mays, Zea nicaraguensis, Zea perennis
- Guest Tree OTUs (Transposable Elements/LTR Retrotransposonss
- Huck, Ji, Opie,
- The guest OTU names may appear in all species in the host tree.Â
- For example Huck will be a named clade in the LTR retrotransposon tree with named leaf nodes in all of the species of Zea.
- Host Tree OTUs
- Example. Gene families in the Bower's dataset
- Element tag terms follow an ontology
- Represent reconciliations for multiple guest trees for a single Host tree
- Example. All the gene trees for a Poplar,Grape and Arabidopsis
Existing XML formats for representing trees.
The following formats are existing xml formats that could be extended to for represent reconciled trees.
Phyloxml
Project Page:Â http://www.phyloxml.org/
Schema:
- Schema Overview -Â http://www.phyloxml.org/documentation/version_1.10/phyloxml_xsd.png
Overview:
- Basic species trees can be described with clade, name, distance, and possibly confidence elements
- Examples of some elements for more specialized analyzes: taxonomic information with scientific name, common name, authority, synonyms, rank, and taxonomy code; sequence data with gene name, sequence accession, and annotation; distribution; date; events such as duplications and speciations; basic control of tree appearance with colors and branch widths
- Property elements allow the addition of domain specific textual and numerical data
- Capability to describe relations between nodes (e.g. to describe nodes with more than one parent), and between sequences (e.g. to express orthology relationships)
Publications:
- Han M.V. and Zmasek C.M. (2009)Â "phyloXML: XML for evolutionary biology and comparative genomics."Â BMC Bioinformatics. 10:356
Perl Support:
Java Support:
- forester library
NEXML
Project Page:Â http://www.nexml.org/
Schema Documentation:
Overview:
- In the first place, we're designing an XML schema. This schema (designated as namespace http://www.nexml.org/2009) is documented on our wiki; the bleeding edge version is available from svn; the source code can be browsed on our site (it's a check out from our repository which is updated every five minutes); for bug reports and feature requests please visit our issue tracker page.
- Secondly, we're implementing NeXML read/write abilities in a number of software packages.
- Third, we're crossreferencing the NeXML schema with the Character Data Analysis Ontologywhich is being developed by other members of the EvoInfo working group.
Publications:
Perl Support:
- BioPerl
- Bio-Phylo
Java Support:
Javascript
NEXML Questions
- Can NeXML tag any node as OTUs or is this restricted to leaf nodes? For example, we may want to tag a Rosids clade within a species tree or a Huck clade within a transposable element guest tree.Â
- Answer from Rutger Vos on 6/13/11 - "The answer is yes. There is no constraint in the standard on whether otu references are attached to internal or terminal nodes. It is possible that not all processing libraries actually implement this correctly, though. If they don't it's a bug that needs to be corrected."
- Can NeXML be extended to officially support reconciliations?
- Response from Rutger Vos (June 13th, 2011):
I don't see a "reconciliations" block making it into the standard any time soon. Rather, I would suggest something like <tree id="host"> <node id="n1"> <meta property="gsoc:hostID" href="http://example.org/host1"/> </node> <!-- etc. --> </tree> <tree id="guest"> <node id="n2"> <meta property="gsoc:isGuestOf" href="http://example.org/host1"/> </tree> </tree> If you want to implement multiple reconciliations you would use nested meta annotations so that you can indicate within which (of multiple uniquely identifiable) reconciliation node n2 is a guest of n1. Likewise if you want to attach additional semantics to any particular reconciliation (e.g. speciations in guests tracking hosts). Obviously this requires a bit more thinking, but host/guest reconciliation to me seems like the kind of metadata that is not very likely to make it into the standard and be implemented in the various libraries (i.e. also in python, ruby in addition to the java, perl and javascript you already mention) so I strongly suggest you try to work out a way to do this with annotations.
Comparing XML Formats
Notes for comparing formats by existing requirements discussed above.
Requirement |
phyloXML |
NEXML |
---|---|---|
1. Host to guest node mapping |
supported? |
supported? |
2. Tag guest node bifurcation |
supported? |
supported? |
3. OTU node tagging |
supported? |
supported? |
4. Tag terms use ontology |
supported? |
supported? |
NeXML Implementation
The reconciliations meta properties that map guest and host trees lives in the top level Trees element in the NeXML document. Within guest nodes, meta properties map to host nodes and edges.
Maping guest and host trees
- "trees" element
- "tron:reconciliations" meta property (ResourceMeta)
- "tron:reconciliation" meta property (ResourceMeta)
- "tron:reconciliation_id" meta property (string)
- "tron:host_tree_id" meta property (string)
- "tron:guest_tree_id" meta property (string)
- "tron:reconciliation_method" meta property (ResourceMeta)
- "tron:reconciliation_software" meta property (string)
- "tron:reconciliation" meta property (ResourceMeta)
- "tron:reconciliations" meta property (ResourceMeta)
Example of reconciliations in top level trees element
  Â
<trees otus="tax1" id="Trees"> <meta property="tron:reconciliations" xsi:type="nexResourceMeta" id="recs" about="recs"> <meta property="tron:reconciliation" xsi:type="nex:ResourceMeta" id="rec1" about="rec1"> <meta property="tron:reconciliation_id" datatype="xsd:string" content="rec1"/> <meta property="tron:host_tree_id" datatype="xsd:string" content="host_tree1"/> <meta property="tron:guest_tree_id" datatype="xsd:string" content="guest_tree1"/> <meta property="tron:reconciliation_method" xsi:type="nex:ResourceMeta" id="meta3" about="meta3"> <meta property="tron:reconciliation_software" datatype="xsd:string" content="TREEBEST"/> </meta> </meta> <meta property="tron:reconciliation" xsi:type="nex:ResourceMeta" id="rec2" about="rec2"> <meta property="tron:reconciliation_id" datatype="xsd:string" content="rec2"/> <meta property="tron:host_tree_id" datatype="xsd:string" content="host_tree1"/> <meta property="tron:guest_tree_id" datatype="xsd:string" content="guest_tree1"/> <meta property="tron:reconciliation_method" xsi:type="nex:ResourceMeta" id="meta3" about="meta3"> <meta property="tron:reconciliation_software" datatype="xsd:string" content="PRIMEGSR"/> </meta> </meta> </meta>
Mappings of guest nodes to host nodes and edges
- Â "node" element
- "tron:reconciliation_node_id" meta property (string)
- "tron:host_node_parent" meta property (string)
- "tron:host_node_child" meta property (string)
- "tron:guest_node_type" meta property (string)
Example of a guest node mapping
<node id ="gn4" otu="g1"> <meta property="tron:reconciliation_node_id" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="reconciled_node4"/> <meta property="tron:host_node_parent" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="hn3"/> <meta property="tron:host_node_child" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="hn3"/> <meta property="tron:guest_node_type" datatype="xsd:string" xsi:type="nex:LiteralMeta" content="leaf_node"/> </node>
Example Trees to Represent in XML (deprecated)
Example 1
An example tree to represent in XML .. this can be edited within the wiki using the GUI markup tool.Â
This represents a cases where there is a single duplication on the branch leading up to the Arabidopsis leaf node.
Example 2
Gene duplication before the LCA of the species tree.Â
Examples of Extending NEXML (deprecated)
The following workspace can be used to draft some examples.
NEXML Example 1
At work description of the reconciled tree indicated above ... using h in tags below to represent the host attributes and g to represent the guest attributes. For example he1 is host edge 1 and hn1 is host node 1 while he1 is host edge1 and hn1 is host node 1.
For information on proposed and existing tags see https://www.nescent.org/wg_evoinfo/Future_Data_Exchange_Standard#Element_description
<otus> Â <otu id="h1" label="Arabidopsis"/> Â <otu id="h2" label="Poplar"/> Â <otu id="h3" label="Grape"/> Â <otu id="g1" label="GeneAt00056"/> <otu id="g2" label="GeneAt01337"/> Â <otu id="g3" label="GenePt00711"/> Â <otu id="g4" label="GeneVv00142"/> </otus> <characters> <!-- Sequence data for each guest OTU --> </characters> <trees> <!-- HOST TREE --> <tree id="host_tree1" label="species tree"> <!-- NODES --> <node id="hn1" otu="h1"/> <node id="hn2"/> <node id="hn3" otu="h2"/> <node id="hn4"/> <node id="hn5" otu="h3"/> <!-- EDGES --> <edge id="he1" source="hn1" target="hn2"/> <edge id="he2" source="hn1" target="hn5"/> <edge id="he3" source="hn2" target="hn3"/> <edge id="he4" source="hn2" target="hn4"/> </tree> <!-- GUEST TREES CAN FOLLOW--> <tree id="guest_tree1" label="Monkeynaut Genes"> <!-- GUEST TREE NODES --> <node id ="gn1"/> <node id ="gn2"/> Â <node id ="gn3"/> Â <node id ="gn4" otu="g1"/> <node id ="gn5" otu="g2"/> <node id ="gn6" otu="g3"/> <node id ="gn7" otu="g4"/> <!-- GUEST TREE EDGES --> <edge id ="ge1" source="gn1" target="gn2"/> <edge id ="ge2" source="gn1" target="gn7"/> <edge id ="ge3" source="gn2" target="gn3"/> <edge id ="ge4" source="gn2" target="gn6"/> <edge id ="ge5" source="gn3" target="gn4"/> <edge id ="ge6" source="gn3" target="gn5"/> </tree> <!-- THE FOLLOWING IS A BIG ADDITION TO NEXML BUT WOULD ALLOW FOR MULTIPLE RECONCILIATIONS WITHIN A SINGLE FILE AS WELL AS ALLOW FOR MULTIPLE RECONCILIATIONS BETWEEN A HOST TREE TOPOLOGY AND A GUEST TREE TOPOLOGY. THIS COULD BE OF A GENERAL CLASS OF CROSS-NETWORK-MAP MAPPINGS OF ONE NETWORK ONTO ANOTHER. THE PARADIGM HERE IS THAT WE ARE MAPPING THE NODES OF THE GUEST TREE ONTO NODES AND EDGES OF THE HOST TREE. FOR EDGE MAPPINGS, MAP BETWEEN TWO NODES ON HOST TREE TOPOLOGY host_node_source != host_node_target FOR NODE MAPPINGS, MAP TO SINGLE NODE ON HOST TREE host_node_source == host_node_target --> <reconciliations> <reconciliation id="rec1" label="Reconciled Monkeynaut Genes"Â host="host_tree1" guest="guest_tree1"Â method="method1"> <reconciled_node id="reconciled_node1" guest_node="gn1"Â host_node_parent="hn1" host_node_child="hn1" guest_node_type="speciation"/> <reconciled_node id="reconciled_node2" guest_node="gn2"Â host_node_parent"hn2" host_node_child="hn2" guest_node_type="speciation"/> <reconciled_node id="reconciled_node3" guest_node="gn3" Â host_node_parent="hn2" host_node_child="hn3" guest_node_type="duplication"/> <reconciled_node id="reconciled_node4" guest_node="gn4" Â host_node_parent="hn3" host_node_child="hn3" guest_node_type="leaf_node"/> <reconciled_node id="reconciled_node5" guest_node="gn5" Â host_node_parent="hn3" host_node_child="hn3" Â guest_node_type="leaf_node"/> <reconciled_node id="reconciled_node6" guest_node="gn6" Â host_node_parent="hn4" host_node_child="hn4" guest_node_type="leaf_node"/> <reconciled_node id="reconciled_node7" guest_node="gn7" Â host_node_parent="hn5" host_node_child="hn5" guest_node_type="leaf_node"/> </reconciliation> </reconciliations> <methods> <method id="method1"/> <!-- MIAPA COMPLIENT METHODS DESCRIPTIONS HERE --> </methods> </trees>