iPG2P NextGen Sequencing

October 30, 2009, 1pm EDT

Attendees: Matt Vaughn, Damian Gessler, Tom Brutnell, Bob Schmitz, Steve Welch, Karla Gendler, Steve Rounsley

Action Items:

  • Brutnell will distribute questionnaire that is posted on the wiki to get feedback
  • Gendler will setup and schedule conference call for talking about polyploidy and pooled sequences


Item 1. Matt Vaughn - Implementing the NextGen pipeline

  • What metadata and experimental parameters need to be collected up front
    • Matt has been writing some narratives to go with pipelines; document on Wiki that talks about fundamentals
    • TB will pull out of Wiki and distribute as document, send to Chris Myers, Doreen Ware, and Ruth Grene
      • issue that came up is alignment score, has to carry through whole pipeline
      • in regards to modeling, some metadata has to be captured in reference to experiments
    • list of what we need to capture, are we going to follow specific formats?  MinSeq is an emerging standard; could probably do adaptation of MIAME to describe experiment
    • privacy and confidentiallity of experiment might cause concerns
  • Dealing with polyploidy and pooled sequences (Need a separate conf call on this)
    • most systems now deal only with diploids
    • issues in SNP calling and representation format (dealing with gamete fusion)
    • are there people who know more about this: Ed Buckler, MV, Matt Hudson, Steve Rounsely, Jermaine
    • how do we extend what is already available?
  • Minimal output from transcriptomics
    • what is needed in tabular output other than transcript id and RPKM
      • alignment score: need to know if hit multiple or in single location, use to color code tracks on genome browser
      • RPKM: visualizing these data sets, way now we are collapsing to gene level
        • RPKM: how do you intrepret abscene of alignment or how do you normalize?  Assumption being made is that people are sequencing to similar depths; RPKM is standard adopted by mammalian community

Item 2. Matt Vaughn - Integration with DataViz

  • Given the output we have defined for the current NGS workflows, what visualization methods are needed/appropriate. What Data Integration tasks will be required to for NGS outputs to act as sources for Viz tools?

Item 3. Tom Brutnell - Prioritizing NextGen activities

  • first pipeline: SNP detection, RPKM
  • necessary to outline versions that will be wanted
  • what are the analyses that we want to enable?
  • initially circulate within working groups to get feedback about what more is wanted/needed
    • Do we want to specify alternative paths for analyzing single samples (more aligners, polymorphic behavior)?
    • Do we want implement additional types (ChIP, assembly, etc) of single-sample analysis?
    • Do we want to work on multiple sample workflows (dealing with replicates, populations, etc)?
    • Other possibilities...
  • come to the Austin meeting with what types of visualization you want to see; tools that exist now are reference implementations
    • what are the pieces of the visualization that you want to see
    • what you like to see versus what you use today?