2018-07-30 Genomics in Education Austin Community College ASMCUE
DNA Subway & Microbiome Workshop July 30-31, 2018 Austin Community College, Highland Campus
Dave Micklos and Bruce Nash, DNA Learning Center, Cold Spring Harbor Laboratory
Uwe Hilgert, BIO5 Institute, University of Arizona
Organizers: | David Micklos, Bruce Nash, and Jason Williams |
---|---|
Local Point of Contact: | Poornima Rao Poornima.rao@austincc.edu |
Trainers: | David Micklos, Bruce Nash nash@cshl.edu , and Uwe Hilgert |
Date: | July 30-31, 2018 |
Location: | Austin Community College - Highland Campus, 6101 Highland Campus Dr., Austin, TX, 78752 (Use North Entrance) |
Workshop Prep
1. CyVerse Account
- Get your free account at http://user.cyverse.org/. Please register with an institutional email account.
2. Laptop
Please bring your own Wi-Fi enabled laptop to the workshop. Make sure your laptop has the following:
- Internet Browser: Please have an up-to-date web browser (We strongly recommend Firefox, Safari, or Chrome; others may not work properly.)
3. DNA Subway
Please ensure that you can log on to DNA Subway: https://dnasubway.cyverse.org/
Please check that you can sign in to the Purple Line (this is on a separate server from DNA Subway) : https://ubiome-demo.ngrok.io/
4. DNA barcoding sample
Bring a small sample of plant tissue (a leaf/flower/etc.) or collect one on campus before the experiment.
Additional Readings and Assignments
- If you wish to bring sequence data for the microbiome analysis and did not get sequencing from us, please do so!
Should you decide to do so and have Illumina Cassava format sequence files, you can upload your files to the Cyverse datastore to make them available for analysis with DNA Subway. To do so, Cyberduck is a fast and relatively straightforward application to handle uploads and downloads that is available free for Mac and PC:
- https://pods.iplantcollaborative.org/wiki/display/DS/Using+Cyberduck+for+Uploading+and+Downloading+to+the+Data+Store
- You will also need a file with your metadata mapped to your samples. We can work to correct errors in your fies, but getting a head start on your metadata is a good idea.
- Qiime2 requires a format, which is explained here: https://docs.qiime2.org/2018.6/tutorials/metadata/ There is also a sample file that can be downloaded from the tutorial. Note that the file must follow the format, which does not allow spaces. Including spaces between words is a very common reason for mapping files fail to validate. Qiime2 requires a tab delimited file (.tsv). These can be saved from within Excel (using "save as") and many other spreadsheet or text editors, but may need the file extension changed. This will be covered during the workshop.
- DNA Subway Guide: https://cyverse-dnasubway-guide.readthedocs-hosted.com/en/latest/
Workshop Powerpoint Presentations:
- Workshop Context, Big Data, and Cyverse: Austin 1Bi Big Data and Cyverse.ppt
- The CURE Challenge: Austin 2 CURE Challenge.pptx
- DNA Barcoding: Austin 3 Barcoding.ppt
- Red Line Annotation: Austin 4 Red Line.ppt
- MaizeCODE Annotation: Austin 5 MaizeCODE Annotation.ppt
- Purple Line Metabarcoding6 Austin Purple Line BN.ppt
- Green Line RNA-Seq:7 Austin Green Line BN.ppt
DNA barcoding and Blue Line resources:
- DNA Barcoding 101: https://www.dnabarcoding101.org/
- BOLD (Barcode Of Life Data System): http://www.boldsystems.org/
Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC522015/
A botanical macroscope: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2722277/
DNA barcoding Brooklyn (New York): A first assessment of biodiversity in Marine Park by citizen scientists:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0199015- MUSCLE (MUltiple Sequence Comparison by Log- Expectation): https://www.ebi.ac.uk/Tools/msa/muscle/
- PHYLIP (the PHYLogeny Inference Package): http://evolution.genetics.washington.edu/phylip/general.html
Microbiome and Purple Line resources:
- Qiime2 documentation (background for the Purple Line): https://qiime2.org/
- Metadata_Table_Corrected
- Emperor (implemented in the Purple Line for data exploration): https://biocore.github.io/emperor/
- Qiime2 feature classifier reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5956843/
- A video on Adapterama indexing:
- A review of global microbiome efforts and challenges: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895116/
- A review of the microbiomes of interiors, with a discussion of variation between studies: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4604073/
- Illumina video on microbiome analyses https://www.youtube.com/watch?v=1uZtCMY-yEw
- Illumina video on MiSeq sequencing by synthesis:
RNAseq and Green Line resources:
- Website for our RNAseq program (now finished) with teaching resources, faculty profiles, and background information: http://www.rnaseqforthenextgeneration.org/
- Link to NIH resource page for teaching microbiomes: https://www.genome.gov/27552808/teaching-the-microbiome/
- Link to the Earth Microbiome project, with protocols, publications, and other resources: http://www.earthmicrobiome.org/
- Description of kallisto, a fast, accurate transcript quantification program: https://pachterlab.github.io/kallisto/about
- Blogs about RNAseq analysis:
Maize annotation and Red Line resources:
- Apollo demo: http://genomearchitect.org/demo/
Arabidopis Chromosome 1est fasta files:
https://de.cyverse.org/dl/d/A9BED6DE-83F3-4F38-A3FE-0AA0A9AF5D53/EST_Chr1_3729956..3804955.fasta- Maize Annotator URL: http://data.maizecode.org/apollo (Requires login)
- MaizeCODE Apollo Guide
MaizeCODE: http://www.maizecode.org
- Unveiling the complexity of the maize genome https://www.nature.com/articles/ncomms11708
MaizeCODE: Annotation: http://data.maizecode.org/apollo/annotator/index
MaizeCODE: Classical Gene Families: https://docs.google.com/spreadsheets/d/18M3zP0WY5RbOAfjM1IQpInZCJetz_l4R2SajC0CsPdg/edit?usp=sharing
Maize Version 4 and RNA Evidence
- Maize Genome Version 4: https://www.nature.com/articles/nature22971
- Seedling transcriptome from RNA-Seq: https://www.nature.com/articles/srep04519
- Trinity assembly from 95 RNA-Seq experiments: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4280997/
MaizeCODE Apollo Track Legends:
B73v4_protein_coding_genes.gff: protein coding genes with AED and QI
est2genome_FLC.gff: Full length cDNAs from genbank
est2genome_GZT.gff: aligned transcripts from the v3 annotation (not used to generate the v4 annotations)
est2genome_ISO.gff: Isoseq data, full length isoform sequencing using PacBio single molecule sequencer
est2genome_MS.gff: Trinity assembled Illumina RNA-Seq data from seedling (high depth)
est2genome_TR.gff: 95 Trinity assembled Illumina RNA-Seq experiments (complexity reduced using cdhit)
protein2genome_AT.gff: Arabidopsis proteins
protein2genome_BD.gff: Brachypodium proteins
protein2genome_GZP.gff: v3 proteins (not used to generate the v4 annotations)
protein2genome_OS.gff: Rice proteins
protein2genome_SB.gff: Sorghum proteins
protein2genome_SI.gff: Setaria proteins
augustus_masked.gff: Gene predictions from augustus
fgenesh_masked.gff: Gene predictions from fgenesh
More detail:
- The est2genome_MS.gff are polished alignments of assembled mRNA-seq data. Here is a URL for the original publication. https://www.nature.com/articles/srep04519
- The est2genome_TR.gff file is a little more involved. We started with 95 mRNA-seq experiments that were publicly available in genbank. They were assembled individually using Trinity. This is described in this publicaiton https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4280997/. Given the large number of experiments there were a lot of redundant transcripts that took a lot of time to align. I used cdhit to filter out redundant transcripts. This file contains the polished alignments of the non-redundant transcripts.
Workshop Agenda
Monday, July 30 9:00 am Welcome, Logistics, Introductions 9:30 am Workshop Objective: Course-Based Undergraduate Research CyVerse and DNA Subway (Micklos) 10:00 am Blue Line 1: DNA Barcoding, DNA Extraction and PCR (Micklos) 11:00 am Purple Line I: Microbiomes, Data Set-up, De-Multiplexing (Nash) 12:45 pm Lunch 1:30 PM Blue Line 2: DNA Barcoding, Gel Electrophoresis and Analysis (Micklos) [1-10, 11-20, 21-25] 3:00 PM Red Line 1: MaizeCODE, Guided Gene Annotation (Micklos and Hilgert) 4:45 PM Purple Line 2: Trimming 5:00 PM Dismissal Tuesday, July 31 9:00 am Purple Line 3: Rarefaction, Core Analysis (Nash) 10:15 am Red Line 2: Independent Gene Annotation (Micklos and Hilgert) 12:15 pm Lunch 1:00 pm Purple Line 4: Analysis (Nash) 2:30 pm Green Line: Differential Gene Expression: RNA-Seq Analysis (Nash) 3:30 pm Evaluation and Online Follow-up 4:00 pm DismissalOther Websites and Links and Resources |
Post Workshop Survey
Please complete this survey at the conclusion of the workshop:
Post Workshop DNA Sequencing Results
Instructions on retrieving your results will be posted here after the workshop.
Results