1KP_Pilot_Study_DeepGreen
1kp data analysis working group
In November 2009 a NESCent/iPlant-sponsored 1KP analysis workshop held in Phoenix to bring together members of the iPlant tree of life's (iPToL) Tree Reconciliation working group, the 1KP project and experts in phylogeny estimation using large, multi-gene data sets. The 1000 Plant Transcriptome Sequencing Initiative (www.onekp.com) aims to resolve relationships across the green plant phylogeny in order to elucidate processes contributing to diversification and biological innovations including origins in multi-cellularity, colonization of land, the evolution of vascular systems and the origins of seeds and flowers. In pursuit of these goals, the project is generating an unparalleled plant sequence database for investigating the evolution of gene families, regulatory networks and biosynthetic pathways. In collaboration with iPlant, the 1KP project plans to make all transcriptome sequences, gene trees, species trees and tree reconciliations available to the plant science community through an iPlant Discovery Environment.
As part of the iPlant Assembling the Tree of Life (iPToL) grand challenge project, the Tree Reconciliation Working Group and collaborators are developing pipelines for analysis of gene families within the context of organismal phylogenies. The working group will first analyze a pilot set of strategically placed streptophytic algae and land plant trascriptomes (see table) being generated by the 1KP project as a focal point for developing workflows aimed at circumscribing gene families, and estimating gene trees and species trees. Results will be shared with the larger community of iPToL and 1KP collaborators and prepared for publication.
Major systematics questions to be addressed in pilot project:
- Which algal lineage(s) are sister to the land plants?
- Do mosses, hornworts and liverworts form a clade?
- To what degree is the timing of gene duplication events correlated across gene families?
- Do we see diversification of gene families and/or functional groups associated with the origin of land plants, vascular plants, seed plants and/or flowering plants?
- Are gene trees affected by sparse taxon sampling?
- Given computational limitations (see below), is it feasible to go even further back in time to resolve the earliest branching events in history of green algae?
Some of the major computational questions include:
- Given existing tools, how can we best infer species trees from reconciliation of unrooted and often poorly resolved (very deeply branching) gene trees?
- What are the limits of the power of currently available gene tree/ species tree reconciliation methods?
- How do we summarize the uncertainty in the gene tree reconciliation in a way that captures uncertainty in the species & gene tree topologies as well as the rooting of both?
- How do we visualize the results of gene tree/ species tree reconciliations?
- Where are the computational bottlenecks and how do we scale these analyses up?
Our experience with the pilot analysis will form the foundation for much larger analyses of the full 1000 transcriptome data set to be generated over the next 24 months. Most importantly, we will aim to address the computation challenges anticipated with analyses of the complete 1KP data set.
Table : Taxa to be included in pilot dataset from the 1KP project
|
Clade |
Order |
Family |
Species |
1KP Code |
Comments |
1 |
Basalmost angiosperms |
Austrobaileyales |
Illiciaceae/Schisan. |
Amborella trichopoda |
URDJ |
Combine 1KP and AAGP data; update 1KP assembly |
2 |
Basalmost angiosperms |
Austrobaileyales |
Illiciaceae/Schisan. |
Nuphar advena |
WTKZ |
Combine 1KP and AAGP data |
3 |
Basalmost angiosperms |
Austrobaileyales |
Illiciaceae/Schisan. |
Kadsura heteroclite |
NWMY |
|
4 |
Magnoliid |
Piperales |
Piperaceae |
Houttuynia cordata |
CSSK |
|
5 |
Magnoliid |
Piperales |
Aristolochiaceae |
Saruma henryi |
QDVW |
Combine 1KP and FGP data |
6 |
Magnoliid |
Magnoliales |
Magnoliaceae |
Liriodendron tulipifera |
- |
AAGP data |
7 |
Magnoliid |
Magnoliales |
Magnoliaceae |
Persea americana |
- |
AAGP data |
8 |
Chloranthales |
Chloranthales |
Chloranthaceae |
Sarcandra glabra |
OSHQ |
|
9 |
Basal Eudicots |
Ranunculales |
Berberidaceae |
Podophyllum pelatum |
WFBF |
|
10 |
Basal Eudicots |
Ranunculales |
Papaveraceae |
Eschscholzia californica |
multi-library |
Combine 1KP and FGP data; multi-1kp-library assembly in progress |
11 |
Basal Eudicots |
Ranunculales |
Ranunculaceae |
Aquilegia formosa X pubescens |
- |
ESTs in GenBank |
12 |
Core Eudicots |
Caryophyllales |
Amaranthaceae |
Kochia scoparia |
WGET |
|
13 |
Core Eudicots/Rosids |
Brassicales |
Brassicaceae |
Arabidopsis thaliana |
- |
Annotated Genome |
14 |
Core Eudicots/Rosids |
Brassicales |
Brassicaceae |
Arabidopsis lyrata |
- |
Annotated Genome |
15 |
Core Eudicots/Rosids |
Brassicales |
Caricaceae |
Carica papaya |
- |
Annotated Genome |
16 |
Core Eudicots/Rosids |
Malpighiales |
Salicaceae |
Populus trichocarpa |
- |
Annotated Genome |
17 |
Core Eudicots/Rosids |
Malpighiales |
Euphorbiaceae |
Ricinus communis |
- |
Annotated Genome |
18 |
Core Eudicots/Rosids |
Malpighiales |
Euphorbiaceae |
Manihot esculenta |
- |
Annotated Genome |
19 |
Core Eudicots/Rosids |
Fabales |
Fabaceae |
Medicago truncatula |
- |
Annotated Genome |
20 |
Core Eudicots/Rosids |
Fabales |
Fabaceae |
Glycine max |
- |
Annotated Genome |
21 |
Core Eudicots/Rosids |
Cucurbitales |
Cucurbitaceae |
Cucumis sativus |
- |
Annotated Genome |
22 |
Core Eudicots/Rosids |
Vitales |
Vitaceae |
Vitis vinifera |
- |
Annotated Genome |
23 |
Core Eudicots/Rosids |
Zygophyllales |
Zygophyllaceae |
Larrea divaricata |
UDUT |
|
24 |
Core Eudicots/Rosids |
Rosales |
Urticaceae |
Boehmeria nivea |
ACFP |
|
25 |
Core Eudicots/Rosids |
Malvales |
Malvaceae |
Hibiscus cannabinus |
OLXF |
|
26 |
Core Eudicots/Asterids |
Gentianales |
Apocynaceae |
Allamanda cathartica |
MGVU |
awaiting assembly of top-off |
27 |
Core Eudicots/Asterids |
Gentianales |
Apocynaceae |
Catharanthus roseus |
UOYN |
|
28 |
Core Eudicots/Asterids |
Lamiales |
Lamiaceae |
Rosmarinus officinalis |
FDMM |
|
29 |
Core Eudicots/Asterids |
Lamiales |
Phrymaceae |
Mimulus guttatus |
- |
Annotated Genome |
30 |
Core Eudicots/Asterids |
Solanales |
Convolvulaceae |
Ipomoea purpurea |
multi-library |
multi-library assembly assembly in progress |
31 |
Core Eudicots/Asterids |
Ericales |
Ebenaceae |
Diospyros malabarica |
KVFU |
|
32 |
Core Eudicots/Asterids |
Asterales |
Asteraceae |
Inula helenium |
AFQQ |
|
33 |
Core Eudicots/Asterids |
Asterales |
Asteraceae |
Tanacetum parthenium |
DUQG |
|
34 |
Monocots |
Acorales |
Acoraceae |
Acorus americanus |
- |
MonAtol |
35 |
Monocots |
Dioscoreales |
Dioscoreaceae |
Dioscorea villosa |
OCWZ |
|
36 |
Monocots |
Liliales |
Colchicaceae |
Colchicum autumnale |
NHIX |
|
37 |
Monocots |
Liliales |
Smilacaceae |
Smilax bona-nox |
MWYQ |
Sequencing in progress |
38 |
Monocots |
Asparagales |
Asparagaceae |
Yucca filamentosa |
ICNN |
|
39 |
Monocots/Commelinids |
Arecales |
Arecaceae |
Chamaedorea seifrizii |
- |
MonAToL |
40 |
Monocots/Commelinids |
Poales |
Poaceae |
Zea Mays |
- |
Annotated Genome |
41 |
Monocots/Commelinids |
Poales |
Poaceae |
Sorghum bicolor |
- |
Annotated Genome |
42 |
Monocots/Commelinids |
Poales |
Poaceae |
Brachypodium distachyon |
- |
Annotated Genome |
43 |
Monocots/Commelinids |
Poales |
Poaceae |
Oryza sativa |
- |
Annotated Genome |
44 |
Gymnosperms |
Pinales |
Taxaceae |
Taxus baccata |
WWSS |
|
45 |
Gymnosperms |
Pinales |
Podocarpaceae |
Prumnopitys andina |
EGLZ |
|
46 |
Gymnosperms |
Pinales |
Sciadopityaceae |
Sciadopitys verticillata |
YFZK |
|
47 |
Gymnosperms |
Pinales |
Cupressaceae |
Juniperus scopulorum |
XMGP |
|
48 |
Gymnosperms |
Pinales |
Cupressaceae |
Cunninghamia lanceolata |
OUOI |
|
49 |
Gymnosperms |
Pinales |
Pinaceae |
Pinus taeda |
- |
Dendrome |
50 |
Gymnosperms |
Pinales |
Pinaceae |
Cedrus libani |
GGEA |
|
51 |
Gymnosperms |
Gnetales |
Gnetaceae |
Gnetum montanum |
GTHK |
combine with NY Consortium data? same species? |
52 |
Gymnosperms |
Gnetales |
Welwitschiaceae |
Welwitschia mirabilis |
- |
FGP set |
53 |
Gymnosperms |
Ephedrales |
Ephedraceae |
Ephedra sinica |
VDAO |
additional reads? |
54 |
Gymnosperms |
Cycadales |
Cycadaceae |
Cycas micholitzii |
XZUY |
NY Consortium data? same species? |
55 |
Gymnosperms |
Cycadales |
Zamiaceae |
Zamia vazquezii |
- |
FGP/AAGP data |
56 |
Gymnosperms |
Ginkgoales |
Ginkgoaceae |
Ginkgo biloba |
SGTW |
Combine with NY Consortium data |
57 |
Moniliformopses |
Osmundales |
Osmundaceae |
Osmunda cinnamonea |
|
No RNA at BGI; Replacement? Barker data? |
58 |
Moniliformopses |
Marattiales |
Marattiaceae |
Angiopteris evecta |
NHCM |
|
59 |
Moniliformopses |
Psilotales |
Psilotaceae |
Psilotum nudum |
QVMR |
|
60 |
Moniliformopses |
Filicales |
Cyatheaceae |
Cyathea (=Alsophila) spinulosa |
GANB |
|
61 |
Moniliformopses |
Polypodiales |
Pteridaceae |
Cryptogramma acrostichoides |
|
No RNA at BGI; Replacement? Barker data? |
62 |
Moniliformopses |
Polypodiales |
Pteridaceae |
Asplenium rhizophyllum |
KJZG |
update 1KP assembly |
63 |
Moniliformopses |
Equisetales |
Equisetaceae |
Equisetum sp. |
CAPN |
awaiting assembly of top-off |
64 |
Lycopods |
Lycopodiales |
Lycopodiaceae |
Huperzia squarrosa |
GAON |
|
65 |
Lycopods |
Selaginellales |
Selaginellaceae |
Selaginella moellendorffii |
- |
Annotated Genome |
66 |
Bryophyta |
Polytrichales |
Polytrichaceae |
Polytrichum commune |
SZYG |
gametophyte |
67 |
Bryophyta |
Sphagnales |
Sphagnaceae |
Sphagnum lescurii |
GOWD |
|
68 |
Bryophyta |
Funariales |
Funariaceae |
Physcomitrella patens |
- |
Annotated Genome |
69 |
Marchantiophyta |
Marchantiales |
Marchantiaceae |
Marchantia polymorpha |
JPYU |
|
70 |
Marchantiophyta |
Marchantiales |
Marchantiaceae |
Marchantia emarginata |
TFYI |
|
71 |
Anthocerotophyta |
Anthocerotales |
Anthocerotaceae |
Nothoceros aenigmaticus |
DXOU |
|
72 |
Anthocerotophyta |
Anthocerotales |
Anthocerotaceae |
Anthoceros |
IQJU |
very few large scaffolds |
73 |
Streptophytic Green Algae |
|
Mesostigmatophyceae |
Mesostigma viride |
KYIO |
|
74 |
Streptophytic Green Algae |
|
Mesostigmatophyceae |
Chlorokybus atmophyticus |
AZZW |
update 1KP assembly |
75 |
Streptophytic Green Algae |
|
Mesostigmatophyceae |
Spirotaenia minuta |
NNHQ |
|
76 |
Streptophytic Green Algae |
|
Klebsormidiophyceae |
Klebsormidium subtile |
FQLP |
|
77 |
Streptophytic Green Algae |
|
Klebsormidiophyceae |
Hormidiella sp. |
|
New |
78 |
Streptophytic Green Algae |
|
Charophyceae |
Chara vulgaris |
MWXT |
|
79 |
Streptophytic Green Algae |
|
Coleochaetophyceae |
Chaetosphaeridium globosum |
DRGY |
|
80 |
Streptophytic Green Algae |
|
Coleochaetophyceae |
Coleochaete scutata |
VQBJ |
|
81 |
Streptophytic Green Algae |
|
Coleochaetophyceae |
Coleochaete orbicularis |
- |
Timme & Delwiche paper |
82 |
Streptophytic Green Algae |
|
Zygnematophyceae |
Cosmarium broomei |
HIDG |
awaiting assembly of top-off |
83 |
Streptophytic Green Algae |
|
Zygnematophyceae |
Netrium digitus |
FFGR |
|
84 |
Streptophytic Green Algae |
|
Zygnematophyceae |
Spirogyra sp. |
HAOX |
|
85 |
Streptophytic Green Algae |
|
Zygnematophyceae |
Spirogyra pratensis |
- |
Timme & Delwiche paper |
Analysis Pipeline and Data Access
Analysis of transcriptome sets for individual species - Collaborators have access to web-accessible databases for blast and annotation term searches.
Phylogenomic Analyses - Coding sequences extracted from transcriptome sets passing quality control will be sorted into gene families and used for estimations of gene trees, species trees and tree reconciliations. Collaborators will have access to alignments and trees.
Transcript Annotation
In support of all 1kp subprojects, all transcript assemblies are or will be accessible in the following ways:
- Assemblies and reads are available to contributers through download sites at the Universities of Texas (TACC) and Alberta (Westgrid),
- The results of prerun BLAST searches will be available through a project website. These results can be searched for annotation terms.
- BLAST searches can be performed on assemblies for each taxon through a project website.
- As described above, (see * Phylogenomic Analyses * ) assemblies will be sorted into gene families based on similarity to plant genes in the NCBI RefSeq database http://www.ncbi.nlm.nih.gov/RefSeq/ and genes from annotated plant genomes that are not in RefSeq. Sequence alignments and gene trees for each family will be available for all 1kp collaborators for ortholog identification.