Post-meeting summary

The following is a general overview of activities and results of the TNRS meeting in St. Louis. Also see meeting agenda .

Timeline

The first half of day 1 of the meeting (April 1) was devoted to presentations, with the goal of identifying community needs and use cases. In addition, Bob Magill and Chuck Mason (MBG) presented on content and informatics capabilities of the Missouri Botanical Garden's TROPICOS database. The afternoon session began with a presentation and discussion of the challenge of concept taxonomy, headed by Bob Peet,  then moved on to defining user requirements for spelling correction (matching to a recognized taxonomic string, regardless of taxonomic status) and synonymy (update a recognized taxonomic string to the "correct" or "accepted" version.
On day 2, the meeting split into two breakout groups. One group continued work on identifying additional use cases, and a second group discussed applications architecture. The  meeting concluded with a discussion of key action items needed to initiate development of the TNRS.

Results

Use cases. We identified broad set of use cases (see Use cases ) illustrating the spectrum of requirements for a TNRS. Many of these use cases were familiar to most participants (e.g., show me the accepted name for this name from  my dataset) but several were not. An example of the latter is the need to extract and preserve information embedded within "misspelled" names (e.g., morphospecies, indications of uncertainty). The distinction between correcting spelling (matching to a published name regardless of nomenclatural status) and standardizing synonyms (updating a published name to the "correct" or "accepted" version") was emphasized.
A novel use case to emerge from the meeting was the need to resolve not just taxon names, but taxon observations. A taxon observation is the application of a taxon name to an organism observed at a particular location and time. Additional information can include time and place of observation, who did the identification, when was it done, and with reference to what literature. Such information can be used to home in on the "correct" name with greater precision. The ability to utilize such information to resolve taxonomic names would constitute a major advance in "taxonomic intelligence". Much of the reference data needed to perform such resolution (at least for the New World) is contained within the TROPICOS database, although it could be harvested from other sources with considerable additional effort.

Architecture. The issue of architecture was controversial. Chuck Mason (MBG) presented a model with the TNRS as an application embedded within MBG's TROPICOS database. Advantages of such a model include

  1. rapid development and deployment due to
    1. maturity and internal consistency of the TROPICOS api,
    2. familiarity of developers with the api (presumably mostly consisting of existing MBG IT staff)
    3. Pconsistency of content (mutliple synonymies share the same names table)
  2. Relatively complete coverage of the Americas
  3. Long "shelf life" within a permanent institution (Missouri Botanical Garden)
  4. Access to additional potentially useful content within TROPICOS. This includes tables of occurrence of taxa within regions, determiners and taxonomists names, taxonomic literature---all of which might prove useful for a more sophisticated "taxonomic intelligence", such as the resolving of taxon observations (see above).

By contrast, disadvantages include:

  1. Potentially "closed" programming environment (it was  not clear if TROPICOS data model a nd source code would be made available)
  2. Difficulty of incorporating external sources of taxonomic data (i.e., regional and monographic synonymy not contained within TROPICOS)
  3. Incomplete coverage of Old World taxa and the "Tree of Life" in general due to TROPICOS's focus  on the New World.

Brad Boyle and Paul Morris presented somewhat different version of an open TNRS model. The main advantages of such a model are:

  1. Incorporation of taxonomic data from various sources, including but not limited to TROPICOS
  2. Import via an established schema, presumably an expanded version of the Taxon Concept Schema (TCS)
  3. Open development environment, including complete transparency of source code and data model.
  4. Potential coverage of all taxa, not just New World

Disadvantages include:

  1. Longer development and deployment time due to
    1. Heterogeneous sources (databases, format) of taxonomic data
    2. Conflicting content of source data, especially challenge of reconciling conflicting synonymies for same name
  2. Lack of access  to additional content within TROPICOS database. Would preclude many more sophisticated aspects of taxonomic intelligence, at least over short term.
  3. Uncertain and possibly short longevity of the TNRS if not hosted by an inherently permanent institution

For details of the models see TNRS_architecture .

We did not reach a final decision regarding which approach will ultimately be used to develop the TNRS. This precluded developing a more formal set of requirements or a setting a development timeline. Obviously a decision must be made before development can proceed.

Next steps

Immediate action items:

  1. Post final summary of detailed and specific use cases incompassing  all the key needs raised during the meeting
  2. Agree upon development architecture
  3. Prepare formal requirements document, satisfying all needs encompassed by the use cases in (1). This document is currently being prepared by BIEN. Note: there has been some debate on the need for this. Prioritized use cases may be sufficient if iPlant and MO prefer to draw up requirements.