Brusslan-He Project Charter v2 (current)

iPAT Brusslan-He Project Charter June 14, 2010 Version 2.0

Project Title: Developing Computational Tools for Comparative Chip-Seq Analysis of Leaf Senescence and for Teaching Data Analysis and Bioinformatics

Start Date: June 15 End Date:  December 15

Project Justification: (problem or opportunity addressed, their Grand Challenge Question) Learn plant genomics/computational analysis; implement new analysis methods to better understand genomic predictors of leaf senescence.

Project Objectives: (deliverables should address these objectives, what is it that this project is trying to address?) Develop test, spike-in data sets, validate analysis methods, analyze Brusslan Arabidopsis leaf Chip-Seq data, then implement new data mining methods to detect correlations with gene expression and other genome features for leaf data.

Overview of Deliverables: (high-level, broad-brush strokes---details can be provided in appendices)
Finish analyses. Develop teaching materials for existing DNASubway stops, work with iPg2p UTHS working group to develop DNASubway teaching data sets and tutorials for RNASeq and possibly for ChipDiff comparative, quantitative anayses.  Write grant proposals for research, jointly write teaching grant proposal for bioinformatics class. Locate two more CSULB or LA-area CISE and plant biology faculty and get them started in computational biology. 
Approach: (basic high-level steps)
Before July:  develop test spike-in data sets, compile ChipDiff, run Brusslan data sets through ChipDiff. 

At Dolan/CSHL (one week visit):  work with Uwe, Cornell, and Matt to

   1. Devleop teaching tools for the existing DNASubway lines, both for Biology undergraduate classes and for computer science classes.

   2.  Develop new data sets and teaching tutorials for the RNASeq subway line (which will be ready about the time you arrive at Dolan).  The UHTS working group test data set is across maize leaf development, which may be too complex for teaching.  You should develop learning goals and possibly select a simpler comparative data set (maybe a mutant compared to wildtype, or a stress, or a tissue difference) that would fit the learning goals for the tutorial/curriculum.

   2.  See if ChipDiff could be integrated into the DNASubway--from the technical point of view, and also considering where students might get additional data sets of this type.  It is useful to have teaching tools that allow comparison of quantitative data, this would fit into many classes in the curriculum.

   3.  Construct a test data set (like the one from the contest at http://seqanswers.com/forums/showthread.php?t=1039, but with two different spike-ins) to validate ChipDiff.  I don't think that ChipDiff has been tested in this correct way, so that would make a paper right there!  Another example of how to make simulated test sets using R is at http://www.gersteinlab.org/proj/chip-seq-simu/.&nbsp.  Discuss possible uses of the test data set in the DNASubway--how could this fit into teaching in biology and computer science?

  4.  Once you have compared the Brusslan 23d to 52d leaf samples, you'll have sets of genomic regions and genes that are different (I'll call this list hmadls genes, for histone methylation affected during leaf senescence).  Then you'll want to do some data mining to find out what other genomic features are correlated with hmadls.  I'd suggest doing a motif analysis with MEME, which is very easy to use, and a more time-consuming analysis to mine any gene feature from TAIR, maybe using the WEKA package (this means downloading all gene features from the tair website and testing to see which keywords, expression amounts, GO terms, etc are over-represented in the hmadls gene set).  This would not necessarily be done while you are at Dolan.  What might be really useful at Dolan would be to work with Cornell to see if MEME or WEKA could be integrated into the DNASubway, and to develop learning goals for this kind of analysis in specific classes.

After Dolan visit:  Use DNAsubway teaching tools in class, advise other faculty who wish to use tools.  Consult with Sue Wessler to help identify faculty in LA area who would be interested in beginning a collaboration, assist the new collaborators in getting started.


Success Criteria: (
Understanding of data analysis and data importance, increased common vocabulary, grant proposals for research and teaching, teaching module development, team teaching of a bioinformatics class, formation of a new 'teach-one' faculty pair in the LA area.

Key Assumptions:
Resources:
Roles and Responsibilities: (include a broad statement of the roles and responsibilities of all people involved or supervising project)
Min He will carry out analyses and teach Judy Brusslan how they work; Judy will prioritize analyses and decide on important features to examine for correlation, as well as explain genomics and plant vocabulary. Min and Judy will locate a new faculty pair and get them started on a computational biology project. Cornell, Uwe and Matt will assist in tutorial development and in evaluation of new analysis methods for possible inclusion in the DNASubway.  Ann Stapleton will provide and coordinate access to iPlant resources, and provide continuing consultation as required.