BIEN_01Jun_11

Notes from TNRS Strategic Planning Meetings

05-31-11 and 06-01-11

Goals for TNRS project

1)    Make TNRS the best possible tool for the plant community.

2)    Make TNRS more general so it is adopted and extended beyond the iPlant community.

What needs to be done to accomplish these goals?

  • Have an API that conforms to existing standards
  • Have an architecture that readily allows TNRS to be extended to include other data sources, especially for other kingdoms
  • Address the de-duplication issue.
    • Create a table to cross-index IDs from different sources.
    • Either run names from other data sources through TNRS or perhaps through GNI to get LSIDs.
    • GNA has dealt with this. It’s time to contact them.
    • Sheldon will contact Paddy Patterson.
  • We do not want to just return the union from multiple data sources.
    • Ecologists need to recognize and unite names having lexical variance and synonyms.

There are 2 very different use cases:

  1. User wants the name from his top authoritative source (e.g. specialist in a geographic area or plant family might want to use flora or manually curated list over large databases)
  2. User wants linkage across multiple data sources (e.g. geneticist wants to be able to retrieve NCBI taxon id for list of taxa from different sources)

These use cases require 3 deliverables:

  1. Change the database schema so it can handle another data source including additional table to link identifiers of names that match across dataset
  2. Change in algorithm to support hierarchical source matching
  3. The interface needs a way to allow user to define hierarchy of sources to use from all available sources.

TNRS should support ingest of Darwin Core archives

List of data sources

  • NCBI (first priority)
  • Kew Plant list
  • USDA plants
  • High quality regional check lists
  • uBio
  • (need Bill’s input to generate a comprehensive list)

It’s time to contact MOBOT, NY Botanical Garden and Kew to update them on the TNRS work and discuss additional ways to collaborate.

  • Martha: Arrange a time to talk with Alan Paton (Kew), Chris Freeland, Barbara Thiers

Discussion of what begins to be or is definitely beyond iPlant’s purview

  • Homonyms will need to be solved as collaborative efforts with outside groups.
    • They need to contribute both expertise and developers to the effort.
    • iPlant should not become another taxonomic authority.

Some minor, but important, fixes that are needed to TNRS:

  • Synonymy is currently dealt with by randomly choosing one of the matches.
    • Better behavior is when names have the exact same score, choose which to report at the top match alphabetically.
    • Brad will ask Jerry to fix this for the June release.
    • Flagging results: add a column with an exclamation point to flag results when:
      • there is more than one best match (for June release)
      • when there is partial matching, e.g., when the genus but not the species is matched (for June release)
      • names have potential ambiguities, e.g., children of the taxa the parent has matched to may map to other taxa (can do after the June release)

Action Items

  • Sheldon will contact Paddy Patterson about collaborating
  • Martha will arrange a time to talk with Alan Paton (Kew), Chris Freeland, Barbara Thiers
  • Naim will talk to Nicole to get the next two items added to Jira
  1. Brad will ask Jerry to fix how synonymy is handled for the June release (order names with the same score alphabetically).
  2. Flagging results: add a column with an exclamation point to flag results when:
    • there is more than one best match (for June release)
    • when there is partial matching, e.g., when the genus but not the species is matched (for June release)
    • names that have potential ambiguities, e.g., children of the taxa the parent has matched to may map to other taxa (can do after the June release)