UHTS_20091030

iPG2P NextGen Sequencing

October 30, 2009, 1pm EDT

Attendees: Matt Vaughn, Damian Gessler, Tom Brutnell, Bob Schmitz, Steve Welch, Karla Gendler, Steve Rounsley

Action Items:

Brutnell will distribute questionnaire that is posted on the wiki to get feedback
Gendler will setup and schedule conference call for talking about polyploidy and pooled sequences

Agenda:

Item 1. Matt Vaughn - Implementing the NextGen pipeline

What metadata and experimental parameters need to be collected up front
- Matt has been writing some narratives to go with pipelines; document on Wiki that talks about fundamentals
- TB will pull out of Wiki and distribute as document, send to Chris Myers, Doreen Ware, and Ruth Grene
  - issue that came up is alignment score, has to carry through whole pipeline
  - in regards to modeling, some metadata has to be captured in reference to experiments
- list of what we need to capture, are we going to follow specific formats? MinSeq is an emerging standard; could probably do adaptation of MIAME to describe experiment
- privacy and confidentiallity of experiment might cause concerns
Dealing with polyploidy and pooled sequences (Need a separate conf call on this)
- most systems now deal only with diploids
- issues in SNP calling and representation format (dealing with gamete fusion)
- are there people who know more about this: Ed Buckler, MV, Matt Hudson, Steve Rounsely, Jermaine
- how do we extend what is already available?
Minimal output from transcriptomics
- what is needed in tabular output other than transcript id and RPKM
  - alignment score: need to know if hit multiple or in single location, use to color code tracks on genome browser
  - RPKM: visualizing these data sets, way now we are collapsing to gene level
    - RPKM: how do you intrepret abscene of alignment or how do you normalize? Assumption being made is that people are sequencing to similar depths; RPKM is standard adopted by mammalian community

Item 2. Matt Vaughn - Integration with DataViz

Given the output we have defined for the current NGS workflows, what visualization methods are needed/appropriate. What Data Integration tasks will be required to for NGS outputs to act as sources for Viz tools?

Item 3. Tom Brutnell - Prioritizing NextGen activities

first pipeline: SNP detection, RPKM
necessary to outline versions that will be wanted
what are the analyses that we want to enable?
initially circulate within working groups to get feedback about what more is wanted/needed
- Do we want to specify alternative paths for analyzing single samples (more aligners, polymorphic behavior)?
- Do we want implement additional types (ChIP, assembly, etc) of single-sample analysis?
- Do we want to work on multiple sample workflows (dealing with replicates, populations, etc)?
- Other possibilities...
come to the Austin meeting with what types of visualization you want to see; tools that exist now are reference implementations
- what are the pieces of the visualization that you want to see
- what you like to see versus what you use today?