TR_NESCent_Visit_09FEB10_Proto_Workflow
(click the image to see full size)
The projected image is the "pre-processing" view of a general workflow (based on the example workflow presented to the group). The modifications being discuss revolve around the two entry points introduced (searching by gene term & Single BLAST). The Single BLAST interface is narrowed to support only blastx and blastp. BLAST will be against the "iPlant Gene Catalog", which will be the 1KP Transcriptome data (likely the pilot data, about 40 species) plus some additional data (indicated in the picture as "X"). This data source will be translated into protein sequences. This is why blastx and blastp will be used. The Gene Term search would allow users to search on GenBank identifiers, genome sequence project identifiers (AT#####, OS#####, etc), EMBL identifiers, etc. This fork in entry point means that there will need to be either a singular "landing page" from the Gene Term search and the Single BLAST for results or specific results pages. The latter seems to be the likely one that will work. BLAST results have been actively discussed in the meetings.
Gene trees of families will need to be pre-computed and available to satisfy user requests. The only valid 'branch' of #3 then is a "gene tree is available." This means the branch involving the "creation" of gene trees need to be pre-computed prior to being made available to the users (there is the implication that this needs to happen as a 'batch' operation as data is modified and added to the gene catalog).
The outputs made available at #5 (either as user workspace files or values from the database/data-source) are as follows:
- gene tree
- "pruned" species tree
- reconciled tree (potentially in a format like NHX, uncertain of this point)
- an image of the reconciled tree in 'fat tree' representation
In the above case, "the" species tree is a subset of the "BIG" species tree (the output from the Big Trees working group, or in the case of the Phase 1 prototype - the NCBI taxonomy tree). The ability to get a pruned version of the "BIG" species tree would be an attractive web service.
Future need: ability for to add/insert a user submitted gene or gene tree into an existing gene tree in the Discovery Environment. This leads to some user interface changed.
New entry point defined/discussion: a user uploads there gene tree and used that in the reconciliation analysis.
The approach of using a 3-panel view that presented the inputs (a gene tree & a pruned species tree) "feeding" into a reconciled tree was discussed. This view would be a "fat tree" representation of the reconciliation. Discussion on Thursday, Feb. 11, revisited this approach when looking at the overall user interface for the prototype.