DA_Workshop_Meeting_Notes
Day 1, November 21, 2009
Steve Goff Intro to iPlant: what iPlant is, what its mission is, what it wont and will do, two GC teams and 11 WGs, additional efforts such as Image analysis, Semantic web, APWeb2 (angiosperm phylogeny), Taxonomic intelligence.
How do the two GC teams interact? So far not much interaction, but with cross-cutting WGs like Data Integration and Visualization, need to explore opportunities to synergize.
Mike Sanderson /Brian O'Meara /Todd Vision /Bill Piel Intro to iPTOL project; additional activities through APWeb, Ninja, 1kp project. Resources include postdocs; Tree visualization most recent WG to get going; Trait Evolution group – once we have a tree of life, can we use it? Tree reconciliation – goal is to add value to the species tree through inference of gene duplication and loss, deep coalescence, and lateral gene transfer events on plant gene trees; Data Integration – taxonomic names to be resolved; built a prototype name resolution service available for testing, in consolidating taxonomic info---344K plant names; implied 144K valid names; Tandy – is what Bill doing accessible? Yes, but v. slow, limited to plants at this time. Using accepted names or must use one only?
Pam Soltis Data Assembly WG, goals of this workshop.
1) assembling data to build the big tree (id sources of data, coordinate funding requiests, id ci needs for assembly) and
2) data for post tree analysis (morphology, ancestral state reconstruction, fossils, traits for eco studies [coord with iPG2P], biogeography, CI needs). This workshop is one of the activities of the WG to achieve these goals. What data should we assemble to build the tree? What strategies are needed to generate the data? How do we assemble the data-ci needs? What EOT products are relevant to data assembly for iPTOL?
Gordon Burleigh Tree building approaches: working on tree reconciliation lately; inferring phyologeny using parsimony approach, or likelihood approach (slower). Advantages-can use non-orthologous sequences, incorporates population genetic and genomic processes that cause incongruence, more species, more incongruence. Disadvantages – input is collection, so lots of error. Supertree discussion dominated by MRP, but also median tree approach using linking of optimality criteria; strategies that combine trees are amendable to many types of data, are fast, but must be very careful, and is not a replacement for conventional phylogenetic approaches.
Stephen Smith – access to data – 2 goals: refining alignments; addition of taxa is sort of linear, at 7K a year.
Alexis Stamatakis – big tree analysis – SPR analysis, ml search convergence, the stopping rule – 3x more construction within same time. New stuff: methods for accurate fossil placement; multistate characters, parallelization and optimization of operations on trees, hybrid MPO/Pthreads version, exelixis rapid research dissemination reports.
AM BREAK
Bill Piel – TreeBASE now adopts a trifecta of communications protocols: NeXML (serialization for transmitting phylogenetic data), PhyloWS (API for querying/accessing), CDAO (ontology of phylogenetic terms). TreeBASE also provides URIs to all objects, thereby allowing persistent universal and resolvable identifiers and allowing third-party services to annotate data in TreeBASE, etc. Results from VoCamp (a workshop supported by iPlant, NESCent, TDWG, and LIRMM) include an extension to the PhyloWS syntax to allow phyloreferencing – i.e. the ability to point to a node in a tree using a non-tree-specific syntax. These are all standards that will be useful for iPTOL.
Michael Donoghue – glomograms and supertrees: glomming stuff together isn't necessarily a good thing. Pruning trees is a big part of the job, done before analysis and requires expertise. Brent question- if you did have a supertree, prefer topology of all data, or in all green plants (local vs. global view)? Is there any room for simultaneous analyses?
Tandy Warnow – MRP – slide of false negative rates for taxa at different sizes, 100, 500, 1000.
Pam Soltis – TOLKIN as a model for integrating data – Tree of Life Knowledge and Information Network – serving as data management system for a number of projects, purpose to provide informatics support for phylodiversity and biodiversity research projects. Other applications – georeferencing, webmapping, import/export of sequences, a way to manage molecular data. Is it project specific? Yes.
LUNCH
Reconvene before breakouts. Integrate the AM breakout topic of CI needs into PM breakouts.
Group Discussion and summaries of breakouts/CI needs
PM BREAK
EOT - APWeb2 – Genomics in Education Workshop – undergrad students aren't keen on bioinformatics, need connection to wet lab; students unhappy if not 2way communication, need to do work that is real and valued; PUI colleges also value publications so important to teachers as well. Shift toward distributed research projects. Trickle up; Orphan data project; How might apply DNA subway to chloroplast genome sequencing? Current annotation tools are good but not great (dogma is buggy) , is there a way for iPlant to meld DNA subway with Dogma to improve it? What are next steps to implement these ideas? May need another workshop on this in particular, work with Dave, Uwe, and Martha and Sue Wessler and just start emailing us. For the pipeline do you have a voucher specimen? Similar model at Yale---go someplace, bring specimens back, bioassay and chemistry. Additional EOT outreach? Dave and Uwe said they'd be happy to help with AP Web interface.
Michael Donoghue – for Sunday, need to think about specific CI needs and how to frame those that are doable. Want to come away with 5 things we want to accomplish. So come back with your top votes.
Day 2, November 22, 2009
Todd Vision – Ontologies – curation strategies, EQ, features of ontologies – controlled vocab that enables machine communication and can be used to annotate data, logically defined relationships between terms to enable logical reasoning and expose data to generic…, serve as a community representation of knowledge.
Peter Stevens – need data standards to function, but need to be used to be any good. Conventions and standards are accepted by consensus. We need are original measurements to bin on the fly when you encounter a problem.
Bill Piel – even solid ontologies don't give you a clear answer; could come up with iPlant list of genera; people want to come up with a checklist; Brent proposes Nico work with iPlant to facilitate development of higher end plant clade names
; Steve G- it'd be good for someone to assess the entire landscape. Phyloreferencing system.
AM BREAK
Breakouts
Molecular data- Mike Moore – updating DOGMA to fix bugs to use for blastome genome annotation; two short term deliverables would be DOGMA and Morlink/Morphbank
. Longerterm is to connect all of these into a workflow, for a real analysis pipeline. An Information Management System would be really useful---how close are we to having something like that and can iPlant help? E.g., impetus for get TOLKIN moving faster and more available?
Engaging the Community – Dick and Brent – low-hanging fruit would be a public website for teachers to see trees and various resources for plant phylogeny, with teaching exercises, etc. Phylobook – phylogenetically structured social network tool – the idea is a hierarchical listserv to which you could subscribe to, second thing is webpages for each node, wiki style, where people could upload lots of stuff, kind of like an online newsletter of announcements, news, queries, etc., that is expandable, with newsfeeds, envision that this could overlay diff databases, all hierarchically structured in the same way; front end would be public, but those who register would have access to deeper levels of tools/resources. Another layer would be geographically oriented..
Morphology – Sarah - tools to specify clade; reconstruct ancestral character traits; if you had a character of interest, might like to study its gene expression database within the context of other characters, new standards for images, other databases; G2P links might provide some images that we can use. Imagine 2 grades – one for images to be used in research, other grade useful for classroom use or general survey. You could use this method (clades) to identify significant gaps; nice to get ontology started; but concern about sufficient 'will' in the community to produce one?
Ecology and geography – Amy/Cam Webb – 2 related CI projects – 1) create a service where you submit a list or tree, then go to Gbif to retrieve point locations for all the entries for those species, another component would be climatological variables associated with points that could be retrieved. Concurrently, a need for scrubbers to look at who providing data, locations relative to along the tree, spatial correlations, output into point datasets or polygon. 2) species traits – using Glopnet trait data across diff species and retrieve across the tree at tips or anywhere across the phylogeny, to give you specific trait data at any point/node. People who are into community phylogenetics, general need to interface w/Google earth.
CI Needs Summary
Michael Donoghue – commonalities in the CI needs summaries –
Please see more detailed discussion of these points...
1) mor/phlawd pipeline
2) DOGMA
3) public website
4) hierarchically structured social network (aka FacePlant!)
5) integrating tree with morphbank type data
6) start on an ontology project
7) connect tree to GBIF
8) connect tree to Glopnet
9) APWeb
10) connect to visualization
11) TOLKIN/Reglin
Tree projects have similar basic structure; from ci standpoint, there could be team or number of developers engaged on this list; this could work with molecular data also.
Val Tannen – idea of connecting the tree to sources like GBIF, Morphbank, etc. already considered in DIWG; so these suggestions make sense. It would be nice to understand the connection beween tree and GBIF and trait evolution. What if those who want to do trait reconstruction don't have precise idea of what they want to do? We need the DE to allow this exploration to be done pleasantly and easily.
The Data Assembly WG identified things that could be done and are being worked on and no need to create a new iPlant group to work on these.
Dan Stanzione – iPTOL website is no problem, challenges to update, curate – need iPlant funds to do this.
DOGMA – Mike Moore, Claude, and Jim are point persons on this one---additional functionality, esp. to make undergrads attracted to use it.
Social networking - Marty, Rick to help with social networking part.
MOR – Stephen Smith to work on this.
Michael thinks another small meeting (at NESCent) to coordinate with iPG2P Data Integration and Visualization WGs.
Dan – from iPlant's perspective, need to look at this want list in context of other WGs' lists and prioritize.
END 1:00 PM
