Workshop Outline-Overview from OSU team

Workshop Outline-Overview from OSU team

OSU Summer Workshop in Bioinformatics: Disentangling Entangled Genomes

The Target

Students and faculty/other researchers with limited backgrounds in some or all aspects of bioinformatics. Participants will have backgrounds in life sciences, computer sciences, mathematics, or statistics, but are assumed to be novices in the area of bioinformatics.

The Goals

(1) Provide a general and basic introduction to the discipline of bioinformatics.

(2) Expose specialists in life sciences, computer science, mathematics, and statistics to the opportunities and challenges provided by incorporating bioinformatics, in particular the various --omics data and analyses, in one’s research. 

            These goals will be met using hands-on, problem-solving exercises involving one or more exemplar data sets.  Exercises will bridge disciplines: biological questions should become understandable to individuals from computer science and mathematics backgrounds, and biologists should gain insights into the computational/mathematical aspects of the problems and solutions.

Workshop Structure

            The workshop will take place during one week (5-6 full days).  Interdisciplinary teams of 4-5 will work idependently through sets of exercises, each addressing a particular bioinformatic challenge We envision a three-step process: (1) presentation and discussion of background information to prepare participants from different academic specialties, (2) guided brainstorming and problem-solving by the interdisciplinary teams, drawing on their complementary talents in biology, computer science, and mathematics, generating a common vocabulary for conceptualizing problems, and (3) programmed exercises that demonstrate and explain the underlying principles of existing bioinformatic tools for problem-solving.

The workshop will conclude with sessions on synthesis.  Teams may work on different data sets initially and synthesize their results at the end of the workshop.  There will be a general discussion of “grand challenges” in bioinformatics, generally, and entangled genomes, specifically.

The Data Set(s)

            Data will be obtained prior to the workshop.  Workshop leaders will have a full understanding of the nature of the data and appropriate analyses they will be support.

The data will consist of a sample of interacting organisms, e.g., a plant and it’s associated fungi, bacteria, and viruses.  Examples:  (1) focal plants, such as milkweed and switchgrass, for which genomic resources already exist; (2) metagenomic samples samples from plant roots and leaves and soil; (3) transcriptomic data [MF1|#_msocom_1] from focal plants under varying conditions(e.g. water stress, cold stress, different light conditions, different tissues); (4) epigenetic data [MF2|#_msocom_2] under varying conditions[EL3|#_msocom_3]  (e.g. different developmental stages).

These data will have been selected to address the following biologicial question: How can we determine how ecologically interacting species have shaped each other’s genomes?

Workshop Activities

 

  1. Discuss the biological and computational problems of entangled genomes.  Cover essential background concepts and find common ground/common vocabulary.  Cover background of cyberinfrastructure (data storage, computer resources, collaboration resources).
  2. Develop a “plan” for studying genomes---what tools are needed?  What computational resources are needed?  What do biologists, computer scientists, and mathematicians bring to the table?
  3. Primary data
    1. Introduction to data files and formats, scripting, infrastructure
    2. Exercise: Characterize the attributes of the pool of short read data, e.g., read quality, length, etc.
    3. Genome assembly[MF4|#_msocom_4] 
      1. Exercise: Assemble reads into contigs
      2. Exercise: Assemble contigs into genomes using de novo, reference-guided, or hybrid methods
      3. Exercise: Characterize the assembled genomes statistically, e.g. read depth, coverage, SNPs, heterozygosity, etc.
      4. Exercise: Characterize the assembled genomes biologically, e.g., highly repetitive and middle repetitive fractions, coding fraction, etc[MF5|#_msocom_5] .
      5. Comparative genomics
        1. Exercise:  identifying synteny and syntenic gene pairs within a genome and among genomes of related species
        2. Exercise:  identifying patterns of genome structure evolution, e.g. whole genome duplication, fractionation, GC content shifts
        3. Exercise: identifying signatures of  selection, e.g. synonymous and non-synonymous substitutions as signs of positive, purifying, and neutral selection
        4. Gene homology and annotation
          1. Exercise: Identify cryptic species in the read pool
          2. Exercise: Annotate genomic data, e.g., determine gene models, putative homology by reference to model organism, apply GO terms, estimate number of paralogs in gene families, etc.
          3. Transcriptomes
            1. Exercise: Evaluate gene predictions from genomic data
            2. Exercise: Evaluate changes in regulation among tissues, individuals, environments
            3. Exercise: Determine gene networks
            4. Metagenomics
              1. Introduction to metagenomic samples
              2. Exercises: Characterize metagenomes and metatranscriptomes
              3. Synthesis: Compare genomes of interacting species
                1. Bring teams together
                2. Are there signatures of biases in GC content, codon usage, etc.?
                3. Are there signatures of horizontal gene transfer?
                4. Are there signatures of reciprocal natural selection?
                5. What are the new discoveries emerging from this workshop?
                6. Are the tools or algorithms for these signatures optimized? Could new tools be developed?
                7. For each participant: What is my potential role as a bioinformatician in solving problems like entangled genomes?

 

Required Resources

 

  1. Cyberinfrastructure, e.g., iPlant accounts
  2. Instructor expertise
    1. Cyberinfrastructure
    2. Plant genome assembly and annotation
    3. Metagenomic genome assembly and annotation
    4. Plant transcriptomics (without networks or statistics, transcriptional profiling can be a fishing expedition)
    5. Plant metagenomics
    6. Gene networks
      -- [-MF1|#_msoanchor_1]Omit--too ambitious?

 [MF2|#_msoanchor_2]Omit---too ambitious?

 [EL3|#_msoanchor_3]No more ambiguous than any other --omic term. . . perhaps given an example? 

 [MF4|#_msoanchor_4]Add visualization/validation exercise.

 [MF5|#_msoanchor_5]annotation