SC_20110322
iPG2P Steering Committee
March 22, 2011
Present: Matt Vaughn, Steve Welch, Martha Narro, Lenny Heath (Ruth’s CS colleague), Tom Brutnell, Jeff White, Steve Goff, Doreen Ware, Chris Myers, Chris Jordan
Notes/Agenda:
DE Release (Matt)
- Description on website
- Last week there was a soft release of the DE (v. 0.3)
- A technical upgrade
- Paves the way for building robust tools, workflows and editors
- What’s new in the DE?
- Looks similar, but:
- Graphical user interface is now generated by metadata through a script. It’s no longer hard-coded.
- Matt showed the changes in the UI.
- Also new:
- Additional sequence genomes supported in the genomics analyses
- JEX (job execution) is now more robust than it was in the 0.2 preview version.
- more sophisticated and user oriented
- Notifications are now persistent
- OSM (object state tracking) – stores state and could play them back.
- Important for implementing provenance tracking
- No new G2P tools were added, however they are now metadata driven.
- New phylogenetics tools have been added.
- Looks similar, but:
- What’s coming in future DE releases:
- A unified data model
- a home directory accessible via DE, file system or command line
- Editer/Viewer software develompment kit
- First release of tool integration
- to enable community to readily extend functionality of DE
- New G2P tools – being implemented by postdocs and Matt
- Genemania
- ermineJ
- SIFT or other SNP assessment tool
- Improved RNAseq
- Data manipulation tools
- Remainder of FASTX toolkit
- GNU Text Utilities for text wrangling
- What’s the unified data model?
- Wherever you go there you are and so is your data.
- Data are stored in iPlant storage facility
- Mountable file system: can make remote data appear in a folder on your desktop.
- Viewer/Editor Toolkit
- Developers can develop new user interfaces to tools.
- Very easy if static viewing of data and still reasonably easy if interactive.
- Viewers (windows) can broadcast information and events among themselves
- The V/E toolkit is released on our github, but not yet documented.
- A unified data model
Tool Integration Update (Matt)
- Process
- Tool integration meetings got started, then petered out.
- It was premature in the 0.2 version.
- Wanted to wait for some of the 0.3 functionality to make it easier:
- basic JSON editor
- syntax validator
- command line validator
- DE tool registration service
- tools are still hand deployed (usually by Matt for g2p)
- active tool in DE
- Expect this functionality to be in place this week
- By the tool integration workshops we hope to have improved tool integration
- Semi-automraic tool deployment
- The tool integrator can have a few of his friends do user acceptance testing on the tool.
- Some tools will run jobs on iPlant resources at UA or Atmosphere. Computationally intensive jobs will run at TACC.
- The Foundational API will be used to run on HPC resources.
- Tool integration meetings got started, then petered out.
Mockups of Tool Integration GUI
Foundational API Update (Matt)
- Schedule for APIs
- V 1.0 of IO and Data APIs have been released
- APPS API due in April
- JOB API due in May
- Can follow progress at http://goo.gl/w5Tww
- There is external interest in the APIs
- Kbase, iRODS, iPlant Semantic Web team, several PGRP projects
Discussion
- TB: Use of the DE platform by community. Developing a number of sustems biology tools that will need a home. This looks like Galaxy but hopefully much better. There was a suggestion to start dumping tools in Galaxy. Could we dump them in iPlant’s DE instead?
- MV: yes
- TB: There will be an April meeting with Chen Wong(?) at Cornell. Is that too soon?
- MV: That’s not too soon. We are very serious about tool integration. Iteration time is now much faster. Still have to write jsons, but those guys are developers and this won’t be a problem for them.
- T: Need a home for the C4 Gates project. Timing seems perfect. Could manage the projects through this interface.
- MV: Let's discuss this off-line.
Optimization Framework (Steve W)
- Steve is doing 3 one-month sabbatic rotations at TACC.
- In modeling, parameter estimation requires hpc resources.
- The range of models is extremely broad in terms of the mathematics, computer approaches, and subjects being modeled.
- Network models (metabolic, gene, continuous time differential equation based, a transcript)
- Often implemented as SBML scripts
- Ecophysiological models which tend to be whole plant models.
- Most often implemented as computer language models compiled into executables. May represent many environmental variables, genotypes, etc.
- Linear models…
- Network models (metabolic, gene, continuous time differential equation based, a transcript)
- The challenge is how to get all these different model types into one framework.
- The linear models may bridge the other model types.
- What seems to be a reasonable approach is:
- Develop some tools that will let you fit models that take the linear forms into account.
- Allows you to segue across the model types.
- SBML models converted to C, tao, parameters optimized
- Wants to bring together a prototype and then get community input.
- The major challenge is how to get the various pieces of such a system to work together.
- Can see how to do much of it in the iPlant environment:
- Have user login to iplant
- Add calibration data
- Specify the model (script file or executable)
- Specify the options
- Output specifications
- This seems straight forward in iPlant environment
- Challenge is when job begins running and things are on different cores.
- Looking at available protocols:
- systems biology workbench allows tools to talk to each other via a protocol. Has a nice message passing protocol.
- Another protocol: mpi interface (message passing interface) is commonly used in hpc
- Googled the OSI Model on Wikipedia
- He’s implementing mpi to move messages from one computer to another. Decode systems biology messages and pass them to tools.
- Goals of the prototype:
- Use the different classes of models
- Allow for constrained optimization
- Penalty functions
- Work in a cluster environment
- Some things they are doing are innovative in TACC:
- Set up a sphere of MPI applications; set up a second sphere and have the two dock.
- They are working on two examples:
- Optimizing set of parameters for Arabidopsis flowering time project
- Network model for wheat (Wacek)
Action Item
- Matt and Tom will talk offline about the DE being the home for the Gates C4 tools.