iPG2P NextGen Sequencing
October 30, 2009, 1pm EDT
Attendees: Matt Vaughn, Damian Gessler, Tom Brutnell, Bob Schmitz, Steve Welch, Karla Gendler, Steve Rounsley
- Brutnell will distribute questionnaire that is posted on the wiki to get feedback
- Gendler will setup and schedule conference call for talking about polyploidy and pooled sequences
Item 1. Matt Vaughn - Implementing the NextGen pipeline
- What metadata and experimental parameters need to be collected up front
- Matt has been writing some narratives to go with pipelines; document on Wiki that talks about fundamentals
- TB will pull out of Wiki and distribute as document, send to Chris Myers, Doreen Ware, and Ruth Grene
- issue that came up is alignment score, has to carry through whole pipeline
- in regards to modeling, some metadata has to be captured in reference to experiments
- list of what we need to capture, are we going to follow specific formats? MinSeq is an emerging standard; could probably do adaptation of MIAME to describe experiment
- privacy and confidentiallity of experiment might cause concerns
- Dealing with polyploidy and pooled sequences (Need a separate conf call on this)
- most systems now deal only with diploids
- issues in SNP calling and representation format (dealing with gamete fusion)
- are there people who know more about this: Ed Buckler, MV, Matt Hudson, Steve Rounsely, Jermaine
- how do we extend what is already available?
- Minimal output from transcriptomics
- what is needed in tabular output other than transcript id and RPKM
- alignment score: need to know if hit multiple or in single location, use to color code tracks on genome browser
- RPKM: visualizing these data sets, way now we are collapsing to gene level
- RPKM: how do you intrepret abscene of alignment or how do you normalize? Assumption being made is that people are sequencing to similar depths; RPKM is standard adopted by mammalian community
Item 2. Matt Vaughn - Integration with DataViz
- Given the output we have defined for the current NGS workflows, what visualization methods are needed/appropriate. What Data Integration tasks will be required to for NGS outputs to act as sources for Viz tools?
Item 3. Tom Brutnell - Prioritizing NextGen activities
- first pipeline: SNP detection, RPKM
- necessary to outline versions that will be wanted
- what are the analyses that we want to enable?
- initially circulate within working groups to get feedback about what more is wanted/needed
- Do we want to specify alternative paths for analyzing single samples (more aligners, polymorphic behavior)?
- Do we want implement additional types (ChIP, assembly, etc) of single-sample analysis?
- Do we want to work on multiple sample workflows (dealing with replicates, populations, etc)?
- Other possibilities...
- come to the Austin meeting with what types of visualization you want to see; tools that exist now are reference implementations
- what are the pieces of the visualization that you want to see
- what you like to see versus what you use today?