/
TR_25JAN10

TR_25JAN10

Action items

Agenda

1. Brief review, any outstanding action items missed (5 min.)
2. Updates on development efforts (10 min.)
3. Requirements Discussion (remainder, ~45 min.)

Requirements Discussion

Outline where we're headed in the next few meetings. We will work through the following:

1. Understand how Tree Reconciliation research done today (what is not possible, shortcomings, annoyances) - the current workflow [*]

2. refine the current workflow into the high-level workflow

  • identify what parts of the workflow are "pre-processing"
  • determine where new development is needed (which is different than improving existing tools/APIs)
    [this will provide an understanding of the scope of prototype development]

3. analysis of problem statements begins (which includes alternative workflows and user actions)

[*] what's not possible falls out of this effort and gives the insight of "where the gaps are". These become our problem statements, which turn into requirements for software development. These requirements will be scoped into projects, like the prototype development.

{Requirements Discussion will need offline feedback and further coverage potentially Feb. 1, 2010}

Notes

Attendees: Andrew Lenards, Cecile Ane, Nicole Hopkins, Adam Kubach, Sheldon McKay, Todd Vision, Jerry Lu, Bill Piel, Jim Leebens-Mack, Natalie Henriques.

Review

  • TR Meeting are now Bi-Weekly, however to keep momentum in the area of workflow analysis Andy has asked members of the working group to meet Monday, Feb. 1 at the regular meeting time. The attendance of Michael Gonzales, Sheldon McKay, Natalie Henriques is not expected (though Sheldon may call-in).
  • Andy and Todd are planning a face-to-face meeting at NESCent in the near future. Sheldon McKay and John Bowers may also attend.
    • [Update: Andy will be at NESCent Tuesday, Feb. 9 through Friday, Feb. 12]
  • Prototype Development: Sheldon gave some updates and thoughts
    • TreeBeST can utilize a "really big" species tree and canse use gene trees that are present on the species tree
    • The output of TreeBeST is a gene tree w/ annotations indicating the duplications. It does not perform the reconciliation. Though the resulting gene tree and the input species tree can be run through PrimeTV and create a "Fat Tree" represented as a static image (JPG? PNG?). That representation will prevent interactive interface with the tree.
  • Action Item Andy provided the Tree Reconciliation working group with the Call-in/WebEx information for the Core Software "Planning and Retrospective" meeting. This serves as a chance for the current development efforts to be demonstrated and feedback gathered.

Requirements Discussion - Workflow: Detailed Walkthrough

Working through the example workflow provided by Todd:

  • [User Action] Our user (Andy) copies the protein (fertility restores, aka PPR/Rf) in FASTA format so it be could used as input
    • [Aside: If there was a gene catalog available in, say, the current Discovery Environment, we would now BLAST against it to look for homologs]
  • [User Action] Andy copies the protein sequence into input box on the Phytome Search:Single Blast tab
    • Questions/Comments on the interface to the search
      • Are the defaults okay? The interface is one familiar to individuals using BLAST (it's just the NCBI BLAST). Jim said that he would change the Expect value (aka E-value) from 10 to .01. But that change should not matter because the best hit will be isolated. Regarding E-values, it was suggested that
      • Do we need to be able to map the protein back to CDS (aka Coding Sequence)? [open question]
      • Should guide users through the process by de-emphasizing BLAST interface, encourage them to find the gene catalog record and use that as their startingpoint
      • Does the BLAST interface alievate the gene naming differences between genome projects and data source? If they found an identifier, how is it resolved? If you want a unique name, that's a problem. If you're okay with multiple names, use GenBank Accession ID for search. Most researchers are willing to find the Accession ID. System (aka Discovery Environment) needs to resolve Accession ID against the gene catalog. Along w/ Accession ID, a select few naming schemes should be supported. Examples: AT##### (Arabidopsis), EMBL, OS#### (rice).
    • Summary - two options for entry into the system have been identified
      • 1. User may not know gene name, so maintain a BLAST interface. User then choose gene from results list they get
      • 2. User has Accession ID (or one of the select/limited supported identifiers)
        • PlantTribes project determined people don't have a good sense of gene family names, so BLAST hits and search on Accession ID were used. PlantTribes does use GO ID (unique identifier for the term) & GO Term.
      • 3? Another entry for search might be terms that appear in accessions, like our example record. So terms like "transcription factor" would get hits in PlantTribes. "PPR" & "Rf" get hits in searches on PlantTribes. The "Definition Line" in a GenBank accession record is also a way to find information with searches in PlantTribes.
        • [REVISIT] Don't get anything beyond the definition line from the GenBank record, so fertility restorer from our example record would appear - but there would be no mention of "PPR" nor "Rf".
  • [User Action] Andy performs the Single BLAST search with input.
  • [User Action] Andy reviews the search results and sees a summary of the inputs, the gene families with hits, and the best scores within each family (both the bit score & E-value).
    • Questions/Comments on the search results
      • What is the "best ensemble?" In the case of our example, the best ensemble is all hits within Gene Family #139.
      • <