1KP_Pilot_Study_DeepGreen
1kp data analysis working group
In November 2009 a NESCent/iPlant-sponsored 1KP analysis workshop held in Phoenix to bring together members of the iPlant tree of life's (iPToL) Tree Reconciliation working group, the 1KP project and experts in phylogeny estimation using large, multi-gene data sets. The 1000 Plant Transcriptome Sequencing Initiative (www.onekp.com) aims to resolve relationships across the green plant phylogeny in order to elucidate processes contributing to diversification and biological innovations including origins in multi-cellularity, colonization of land, the evolution of vascular systems and the origins of seeds and flowers. In pursuit of these goals, the project is generating an unparalleled plant sequence database for investigating the evolution of gene families, regulatory networks and biosynthetic pathways. In collaboration with iPlant, the 1KP project plans to make all transcriptome sequences, gene trees, species trees and tree reconciliations available to the plant science community through an iPlant Discovery Environment.
As part of the iPlant Assembling the Tree of Life (iPToL) grand challenge project, the Tree Reconciliation Working Group and collaborators are developing pipelines for analysis of gene families within the context of organismal phylogenies. The working group will first analyze a pilot set of strategically placed streptophytic algae and land plant trascriptomes (see table) being generated by the 1KP project as a focal point for developing workflows aimed at circumscribing gene families, and estimating gene trees and species trees. Results will be shared with the larger community of iPToL and 1KP collaborators and prepared for publication.
Major systematics questions to be addressed in pilot project:
Which algal lineage(s) are sister to the land plants?
Do mosses, hornworts and liverworts form a clade?
To what degree is the timing of gene duplication events correlated across gene families?
Do we see diversification of gene families and/or functional groups associated with the origin of land plants, vascular plants, seed plants and/or flowering plants?
Are gene trees affected by sparse taxon sampling?
Given computational limitations (see below), is it feasible to go even further back in time to resolve the earliest branching events in history of green algae?
Some of the major computational questions include:
Given existing tools, how can we best infer species trees from reconciliation of unrooted and often poorly resolved (very deeply branching) gene trees?
What are the limits of the power of currently available gene tree/ species tree reconciliation methods?
How do we summarize the uncertainty in the gene tree reconciliation in a way that captures uncertainty in the species & gene tree topologies as well as the rooting of both?
How do we visualize the results of gene tree/ species tree reconciliations?
Where are the computational bottlenecks and how do we scale these analyses up?
Our experience with the pilot analysis will form the foundation for much larger analyses of the full 1000 transcriptome data set to be generated over the next 24 months. Most importantly, we will aim to address the computation challenges anticipated with analyses of the complete 1KP data set.
Table : Taxa to be included in pilot dataset from the 1KP project
| Clade | Order | Family | Species | 1KP Code | Comments |
1 | Basalmost angiosperms | Austrobaileyales | Illiciaceae/Schisan. | Amborella trichopoda | URDJ | Combine 1KP and AAGP data; update 1KP assembly |
2 | Basalmost angiosperms | Austrobaileyales | Illiciaceae/Schisan. | Nuphar advena | WTKZ | Combine 1KP and AAGP data |
3 | Basalmost angiosperms | Austrobaileyales | Illiciaceae/Schisan. | Kadsura heteroclite | NWMY |
|
4 | Magnoliid | Piperales | Piperaceae | Houttuynia cordata | CSSK |
|
5 | Magnoliid | Piperales | Aristolochiaceae | Saruma henryi | QDVW | Combine 1KP and FGP data |
6 | Magnoliid | Magnoliales | Magnoliaceae | Liriodendron tulipifera | - | AAGP data |
7 | Magnoliid | Magnoliales | Magnoliaceae | Persea americana | - | AAGP data |
8 | Chloranthales | Chloranthales | Chloranthaceae | Sarcandra glabra | OSHQ |
|
9 | Basal Eudicots | Ranunculales | Berberidaceae | Podophyllum pelatum | WFBF |
|
10 | Basal Eudicots | Ranunculales | Papaveraceae | Eschscholzia californica | multi-library | Combine 1KP and FGP data; multi-1kp-library assembly in progress |
11 | Basal Eudicots | Ranunculales | Ranunculaceae | Aquilegia formosa X pubescens | - | ESTs in GenBank |
12 | Core Eudicots | Caryophyllales | Amaranthaceae | Kochia scoparia | WGET |
|
13 | Core Eudicots/Rosids | Brassicales | Brassicaceae | Arabidopsis thaliana | - | Annotated Genome |
14 | Core Eudicots/Rosids | Brassicales | Brassicaceae | Arabidopsis lyrata | - | Annotated Genome |
15 | Core Eudicots/Rosids | Brassicales | Caricaceae | Carica papaya | - | Annotated Genome |
16 | Core Eudicots/Rosids | Malpighiales | Salicaceae | Populus trichocarpa | - | Annotated Genome |
17 | Core Eudicots/Rosids | Malpighiales | Euphorbiaceae | Ricinus communis | - | Annotated Genome |
18 | Core Eudicots/Rosids | Malpighiales | Euphorbiaceae | Manihot esculenta | - | Annotated Genome |
19 | Core Eudicots/Rosids | Fabales | Fabaceae | Medicago truncatula | - | Annotated Genome |
20 | Core Eudicots/Rosids | Fabales | Fabaceae | Glycine max | - | Annotated Genome |
21 | Core Eudicots/Rosids | Cucurbitales | Cucurbitaceae | Cucumis sativus | - | Annotated Genome |
22 | Core Eudicots/Rosids | Vitales | Vitaceae | Vitis vinifera | - | Annotated Genome |
23 | Core Eudicots/Rosids | Zygophyllales | Zygophyllaceae | Larrea divaricata | UDUT |
|
24 | Core Eudicots/Rosids | Rosales | Urticaceae | Boehmeria nivea | ACFP |
|
25 | Core Eudicots/Rosids | Malvales | Malvaceae | Hibiscus cannabinus | OLXF |
|
26 | Core Eudicots/Asterids | Gentianales | Apocynaceae | Allamanda cathartica | MGVU | awaiting assembly of top-off |
27 | Core Eudicots/Asterids | Gentianales | Apocynaceae | Catharanthus roseus | UOYN |
|
28 | Core Eudicots/Asterids | Lamiales | Lamiaceae | Rosmarinus officinalis | FDMM |
|
29 | Core Eudicots/Asterids | Lamiales | Phrymaceae | Mimulus guttatus | - | Annotated Genome |
30 | Core Eudicots/Asterids | Solanales | Convolvulaceae | Ipomoea purpurea | multi-library | multi-library assembly assembly in progress |
31 | Core Eudicots/Asterids | Ericales | Ebenaceae | Diospyros malabarica | KVFU |
|
32 | Core Eudicots/Asterids | Asterales | Asteraceae | Inula helenium | AFQQ |
|
33 | Core Eudicots/Asterids | Asterales | Asteraceae | Tanacetum parthenium | DUQG |
|
34 | Monocots | Acorales | Acoraceae | Acorus americanus | - | MonAtol |
35 | Monocots | Dioscoreales | Dioscoreaceae | Dioscorea villosa | OCWZ |
|
36 | Monocots | Liliales | Colchicaceae | Colchicum autumnale | NHIX |
|
37 | Monocots | Liliales | Smilacaceae | Smilax bona-nox | MWYQ | Sequencing in progress |
38 | Monocots | Asparagales | Asparagaceae | Yucca filamentosa | ICNN |
|
39 | Monocots/Commelinids | Arecales | Arecaceae | Chamaedorea seifrizii | - | MonAToL |
40 | Monocots/Commelinids | Poales | Poaceae | Zea Mays | - | Annotated Genome |
41 | Monocots/Commelinids | Poales | Poaceae | Sorghum bicolor | - | Annotated Genome |
42 | Monocots/Commelinids | Poales | Poaceae | Brachypodium distachyon |