SC_20110322

iPG2P Steering Committee
March 22, 2011

Present: Matt Vaughn, Steve Welch, Martha Narro, Lenny Heath (Ruth’s CS colleague), Tom Brutnell, Jeff White, Steve Goff, Doreen Ware, Chris Myers, Chris Jordan

Notes/Agenda:

Matt's slides

DE Release (Matt)

  • Description on website
  • Last week there was a soft release of the DE (v. 0.3)
    • A technical upgrade
    • Paves the way for building robust tools, workflows and editors
  • What’s new in the DE?
    • Looks similar, but:
      • Graphical user interface is now generated by metadata through a script. It’s no longer hard-coded.
      • Matt showed the changes in the UI.
    • Also new:
      • Additional sequence genomes supported in the genomics analyses
      • JEX (job execution) is now more robust than it was in the 0.2 preview version.
        • more sophisticated and user oriented
      • Notifications are now persistent
      • OSM (object state tracking) – stores state and could play them back. 
        • Important for implementing provenance tracking 
      • No new G2P tools were added, however they are now metadata driven.
      • New phylogenetics tools have been added.
  • What’s coming in future DE releases:
    • A unified data model
      • a home directory accessible via DE, file system or command line
    • Editer/Viewer software develompment kit
    • First release of tool integration
      • to enable community to readily extend functionality of DE
    • New G2P tools – being implemented by postdocs and Matt
      • Genemania
      • ermineJ
      • SIFT or other SNP assessment tool
      • Improved RNAseq
      • Data manipulation tools
      • Remainder of FASTX toolkit
      • GNU Text Utilities for text wrangling
    • What’s the unified data model? 
      • Wherever you go there you are and so is your data.
      • Data are stored in iPlant storage facility
      • Mountable file system: can make remote data appear in a folder on your desktop.
    • Viewer/Editor Toolkit
      • Developers can develop new user interfaces to tools. 
      • Very easy if static viewing of data and still reasonably easy if interactive.
      • Viewers (windows) can broadcast information and events among themselves
      • The V/E toolkit is released on our github, but not yet documented.

Tool Integration Update (Matt)

  • Process
    • Tool integration meetings got started, then petered out. 
      • It was premature in the 0.2 version. 
      • Wanted to wait for some of the 0.3 functionality to make it easier:
        • basic JSON editor
        • syntax validator
        • command line validator
        • DE tool registration service
        • tools are still hand deployed (usually by Matt for g2p)
        • active tool in DE
      • Expect this functionality to be in place this week
    • By the tool integration workshops we hope to have improved tool integration
      • Semi-automraic tool deployment
      • The tool integrator can have a few of his friends do user acceptance testing on the tool.
    • Some tools will run jobs on iPlant resources at UA or Atmosphere. Computationally intensive jobs will run at TACC.
      • The Foundational API will be used to run on HPC resources.

Mockups of Tool Integration GUI

Foundational API Update (Matt)

  • Schedule for APIs
    • V 1.0 of IO and Data APIs have been released
    • APPS API due in April
    • JOB API due in May
  • Can follow progress at http://goo.gl/w5Tww
  • There is external interest in the APIs
    • Kbase, iRODS, iPlant Semantic Web team, several PGRP projects

Discussion

  • TB: Use of the DE platform by community. Developing a number of sustems biology tools that will need a home. This looks like Galaxy but hopefully much better. There was a suggestion to start dumping tools in Galaxy. Could we dump them in iPlant’s DE instead? 
  • MV: yes
  • TB: There will be an April meeting with Chen Wong(?) at Cornell. Is that too soon?
  • MV: That’s not too soon. We are very serious about tool integration. Iteration time is now much faster. Still have to write jsons, but those guys are developers and this won’t be a problem for them. 
  • T: Need a home for the C4 Gates project. Timing seems perfect. Could manage the projects through this interface.
  • MV: Let's discuss this off-line.

Optimization Framework (Steve W)

  • Steve is doing 3 one-month sabbatic rotations at TACC.
  • In modeling, parameter estimation requires hpc resources.
  • The range of models is extremely broad in terms of the mathematics, computer approaches, and subjects being modeled.
    • Network models (metabolic, gene, continuous time differential equation based, a transcript)
      • Often implemented as SBML scripts
    • Ecophysiological models which tend to be whole plant models. 
      • Most often implemented as computer language models compiled into executables. May represent many environmental variables, genotypes, etc.
    • Linear models…
  • The challenge is how to get all these different model types into one framework.
  • The linear models may bridge the other model types.
  • What seems to be a reasonable approach is:
    • Develop some tools that will let you fit models that take the linear forms into account.
    • Allows you to segue across the model types. 
    • SBML models converted to C, tao, parameters optimized
  • Wants to bring together a prototype and then get community input.
  • The major challenge is how to get the various pieces of such a system to work together.
  • Can see how to do much of it in the iPlant environment:
    • Have user login to iplant
    • Add calibration data
    • Specify the model (script file or executable)
    • Specify the options
    • Output specifications
  • This seems straight forward in iPlant environment
  • Challenge is when job begins running and things are on different cores.
  • Looking at available protocols: 
    • systems biology workbench allows tools to talk to each other via a protocol. Has a nice message passing protocol.
    • Another protocol: mpi interface (message passing interface) is commonly used in hpc
  • Googled the OSI Model on Wikipedia
  • He’s implementing mpi to move messages from one computer to another. Decode systems biology messages and pass them to tools.
  • Goals of the prototype:
    • Use the different classes of models
    • Allow for constrained optimization
    • Penalty functions
    • Work in a cluster environment
  • Some things they are doing are innovative in TACC:
    • Set up a sphere of MPI applications; set up a second sphere and have the two dock. 
  • They are working on two examples:
    • Optimizing set of parameters for Arabidopsis flowering time project
    • Network model for wheat (Wacek)

Action Item

  • Matt and Tom will talk offline about the DE being the home for the Gates C4 tools.