DA_07FEB11
Agenda
Data intake pipeline
Update on PHLAWD
Strategy:
Establish a robust data pipeline that generates multiple sequence alignments to feed into big species tree generation
Engage iPlant faculty (Nirav Merchant, Sudha Ram, Eric Lyons) for domain expertise in data infrastructure, meta-data management and scientific workflows.
Deliverables:
Robust data upload capability (IRODS; iPlant-wide requirement)
Meta-data management
Robust data storage and retrieval (iPlant-wide requirement)
Input data validation
Multiple sequence alignment generation
PHLAWD
MORE (?)
Muscle, etc
Sequence database(s)
Questions
How will data assembly feed into big trees?
How much overlap with onekp?
Notes
Action Item: Sheldon will establish contact with Gordon Burleigh, and have him work with John Cazes and Eric Lyons
Sharon Wei is on maternity leave, John Cazes will assume responsibilities in her absence
The 1kp project whould be a subset of Data Assembly and not independent of it
Need a streamlined mechanism to get data into huge matrices to build huge trees. Being able to compile and analyze data to create a tree
Need to come up with a good data model. Huge overlap, 1kp and other data storage need to be brought into the discussion. Have all
data stored in a similar format and tools get developed around that format.Not appropriate to have only one alignment tool availablie, any set of alternatives that can be included would be good
RAXml and Big Tree building will communicate with the DA group reporting their activities at the DA meetings
Establish more consistent communication with Pam and Doug Soltis