Data intake pipeline
Update on PHLAWD
- Establish a robust data pipeline that generates multiple sequence alignments to feed into big species tree generation
- Engage iPlant faculty (Nirav Merchant, Sudha Ram, Eric Lyons) for domain expertise in data infrastructure, meta-data management and scientific workflows.
- Robust data upload capability (IRODS; iPlant-wide requirement)
- Meta-data management
- Robust data storage and retrieval (iPlant-wide requirement)
- Input data validation
- Multiple sequence alignment generation
- MORE (?)
- Muscle, etc
- Sequence database(s)
- How will data assembly feed into big trees?
- How much overlap with onekp?
Action Item: Sheldon will establish contact with Gordon Burleigh, and have him work with John Cazes and Eric Lyons
- Sharon Wei is on maternity leave, John Cazes will assume responsibilities in her absence
- The 1kp project whould be a subset of Data Assembly and not independent of it
- Need a streamlined mechanism to get data into huge matrices to build huge trees. Being able to compile and analyze data to create a tree
- Need to come up with a good data model. Huge overlap, 1kp and other data storage need to be brought into the discussion. Have all
data stored in a similar format and tools get developed around that format. - Not appropriate to have only one alignment tool availablie, any set of alternatives that can be included would be good
- RAXml and Big Tree building will communicate with the DA group reporting their activities at the DA meetings
- Establish more consistent communication with Pam and Doug Soltis