OneKP Capstone Wiki
The 1000 plants (oneKP or 1KP) initiative is an international multi-disciplinary consortium that has generated large-scale gene sequencing data for over 1000 species of plants. Major supporters include Alberta Ministry of Innovation and Advanced Education, Musea Ventures (Somekh Family Foundation), Beijing Genomics Institute in Shenzhen (BGI-Shenzhen), China National GeneBank (CNGB), iPlant Tree-of-Life (iPToL) Grand Challenge,Compute Canada (Westgrid), Alberta Innovates Technology Futures (AITF-iCORE Strategic Chair). The sample selection was originally based on a series of overlapping sub-projects with scientific objectives that could be addressed by sequencing multiple plant species -- descriptions of these sub-projects and the associated species list can be found at http://www.onekp.com. As more collaborators joined 1KP, the objectives evolved and are now exemplified by a diverse collection of papers. Here we describe the plans for a final capstone paper.
1000 Plants Data Set
We generated an average of 2Gb of RNA-seq data per sample on the Illumina sequencing platform (GA2 or HiSeq). For the most part, we sequenced one tissue sample per species, although exceptions were made when the scientific objectives required it. Paired-end data were assembled by SOAPdenovo-trans (http://soap.genomics.org.cn/SOAPdenovo-Trans.html). Each sample typically yielded 10k scaffolds with lengths of greater than 1kb. The results are released on password-protected repositories at Westgrid (http://onekp.westgrid.ca/1kp-data) and TACC (http://web.corral.tacc.utexas.edu/OneKP). These repositories contain raw unassembled reads, assembled transcriptomes, and as an estimate of gene expression levels, averaged read depths computed across each scaffold.
One of the distinguishing characteristics of 1KP is the fact that we had no restrictions on the species that had to be sequenced. Although the majority of our sub-projects had applications-focused objectives, the majority of our samples were chosen to represent every species known to science, across the plant kingdom, at some phylogenetically or taxonomically defensible level. For example, we sequenced a representative of nearly all of the 415 known angiosperm families, and about a fifth of our sequenced species are algae. Most of these species have never been subjected to large-scale gene sequencing, and as a result, our cumulative efforts have now sequenced approximately 2 orders of magnitude more genes than the totality of the public databases.
Figure 1: Number of genes sequenced by 1KP as compared to the entirely of the NCBI databases as of March 2012. The branches of the phylogenetic tree are weighted by gene count. For artistic reasons, the scale is not uniform along the vertical axis. Black bars at the right indicate the approximate widths for one million genes. Notice that the 1KP counts were based on the initial SOAPdenovo assemblies, which contain roughly half as many genes as the new SOAPdenovo-trans assemblies that will be published.
Many bioinformatic analyses are being carried out for project-wide consumption. A phylogenomics pipeline (https://pods.iplantcollaborative.org/wiki/display/iptol/1KP_Phyloinformatics_Pipeline) has been developed and, in the process of computing the species tree, it will also provide the consortium with corrected reading frames, multiple sequence alignments, and gene family clusters. These results will be available for consortium use well before the species trees are computed. Ultimately, every gene tree will be reconciled against the species tree, to determine the timing of the gene duplications within each gene family relative to the speciation events. We are working with iPlant developers to provide web-based visualizations. The following describes the analyses being done.
Sorting transcripts into gene families (Naim Matasci): Finished; but automatically generated orthoMCL clusters are only approximations to gene families and some manual curation will be required.
Alignment and gene tree estimation and reconciliation with NCBI taxon tree (Tandy Warnow, Jim Leebens-Mack, and iPLANT): Scripts have been written and validated for gene families with <2000 genes. We can try pushing this to 10,000 but it will be difficult. Pipelines for doing the reconciliations and placing them into the viewer are in place. The fact that we will eventually replace the NCBI taxon tree should not be a bottleneck as it is mostly correct.
Reconciliation viewer (Naim Matasci): Software components are in place.The plan is to showcase the iPLANT website as part of our pilot publication on the phylogenomics of the first ~100 species.
Protein-protein interactions (Ling-Hong Hung): Based on homology to known interacting pairs, http://cando.compbio.washington.edu/wiki/CANDO.
Publications Strategy
All of the data will be released with the publication of a high-impact "capstone" paper that we will submit in 2015, along with multiple companion papers arising from the many 1KP sub-projects and the many analyses performed for the capstone. Even before the capstone is submitted, we expect to have published at least 3 methodology papers: on the RNA extractions, on the SOAPdenovo-trans assembler, and on the phylogenomics for a pilot data set of about 100 species. Many papers are already being published in advance of the capstone, because they require little of the data to be released prematurely, and their objectives are so different that early publication will not dilute the capstone's impact. All papers are tracked on the consortium wiki.
The capstone will emphasize the diversity of the species we chose. It will have 2 major components, starting first with a phylogenomics analysis of all 1000 species. A species tree will be computed from the subset of "low copy" genes, and on this tree, all of the gene families (low and high copy number, functionally characterized or not) will be attached and displayed within the iPlant framework. We expect in the process to resolve some important questions on single to multi-cellular evolution. To expand our readership beyond the systematics community, we will also perform an analysis of the gene changes associated with major evolutionary transitions across the plant kingdom. A core group of the major 1KP contributors has developed a working list (summarized below) of important evolutionary transitions and well-studied gene families.
Central to this plan is the recruitment of gene family experts to help interpret our results. We do not expect that all of the recruited experts will discover something truly novel, and neither is it necessary (or feasible) for us to analyze every gene family known to science. The idea is to cast a wide net, so that when we write the capstone, we can cherry-pick the most interesting and compelling results to incorporate into that paper. Given the inevitable page limits, it is unlikely that we will have enough space to discuss more than a few evolutionary transitions and gene families. Any unused analyses will go into the expected companion papers. All collaborators are encouraged to publish their own papers with controlling authorship and on their own timeline.
The capstone will be published under a consortium byline, "the 1000 plants (1KP) consortium", similar to the ENCODE project consortium on the September 6th 2012 cover of Nature. Everyone who contributed to the data collection or the data analysis will be included in the author list; individual contributions will be clearly stated. Although it is our intention and desire that people use this data, we cannot sabotage the capstone by allowing too much data to be released prematurely, or by allowing scientific conclusions destined for the capstone to be published in advance. In general, analyses of sufficient phylogenetic breadth run the risk of potential conflict with the capstone. We trust our collaborators to exercise the appropriate self-restraint, and also to appreciate that we want their best results to be incorporated into the capstone itself.
Additional papers, strong enough to justify a separate publication independent of the capstone, are most welcome. However the objectives must be sufficiently different (e.g. correlation of polyploidy and angiosperm diversification) to prevent the journal editors from asking us to merge everything together. Any paper that has a conceptual overlap with the capstone should be submitted to another journal and/or submitted after the capstone.
Selected Transitions
The green plant tree of life is marked by numerous innovations including the evolution of multi-cellularity, transitions from marine to freshwater and terrestrial environments, maternal retention of zygotes and embryos, the evolution of complex life histories including haploid and diploid phases, and the origin of vascular systems, the seed and the flower. These innovations define key transitions in the history of green plants and the origin of diverse groups of plants that form the foundation of local and our global biota. Having sampled transcriptomes across the green tree of life, 1KP is in a unprecedented position to assess changes in gene content or gene expression associated with each of these key transitions. Our phylogenomics pipeline (https://pods.iplantcollaborative.org/wiki/display/iptol/1KP_Phyloinformatics_Pipeline) is placing assembled transcripts into gene family alignments and gene trees. All 1KP consortium members will be given access to these analyses. To assist consortium members who are unfamiliar with the sequenced plant species and their evolutionary history, species/taxon consultants have been assigned to each transition. There are other important transitions (e.g., stomatophytes between embryophytes and tracheophytes) that we did not list. In those instances, the relevant expert(s) would be the ones for the next largest group.
Origin of Viridiplantae (Green Plants)Â
Species/Taxon Consultant: Michael Melkonian, E-mail michael.melkonian@uni-koeln.de
Characteristic Innovations: origin of plastids with Chl a+b, intraplastidial starch synthesis, nuclear localization of the gene for RbcS, origin of whiplash flagella
Origin of Streptophyta
Species/Taxon Consultant: Michael Melkonian, E-mail michael.melkonian@uni-koeln.de
Characteristic Innovations: possible transition to freshwater, change in cell division, origin of phragmoplast and comparison to origin of phycoplast, flagella asymmetrically attached, origin of plasmodesmata
Origin of Embryophytes (Land Plants)
Species/Taxon Consultant: Michael Melkonian, E-mail michael.melkonian@uni-koeln.de; Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org
Characteristic Innovations: cuticle, poikilohydry (desiccation tolerance/water control in somatic protoplasm), food-conducting cells, lignin and lignin-precursors (non-mechanical functions) -- any evidence of these in algae?
Origin of Tracheophytes (Vascular Plants)
Species/Taxon Consultant: Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org
Characteristic Innovations: homoiohydry (stable water supply to tissues) including less desiccation control and origin of 'true' vascular tissue -- xylem with lignified tracheids, specialized food-conducting apparatus -- phloem tissue; autonomous sporophytes, reduced thalloid gametophytes
Origin of Euphyllophytes
Species/Taxon Consultant: Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org
Characteristic Innovations: overtopping growth form
Origin of Spermatophytes (Seed Plants)
Species/Taxon Consultant: Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org
Characteristic Innovations: axillary shoot branching
Origin of Angiosperms (Flowering Plants)
Species/Taxon Consultant: Doug Soltis, E-mail dsoltis@botany.ufl.edu; Pam Soltis, E-mail psoltis@flmnh.ufl.edu; Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org
Characteristic Innovations: extremely reduced gametophytes
Diversification of Mesangiosperms (including Monocots, Eudicots, Magnoliids)Â
Species/Taxon Consultant: Doug Soltis, E-mail dsoltis@botany.ufl.edu; Pam Soltis, E-mail psoltis@flmnh.ufl.edu; Sean Graham, E-mail swgraham@interchange.ubc.ca; Dennis Stevenson, E-mail dws@nybg.org; Jim Leebens-Mack E-mail jleebensmack@plantbio.uga.edu
Characteristic Innovations: (origin of monocots) calcium oxalate raphides, no vessels in leaves, steroidal saponins, diffuse vascular bundles, (origin of core eudicots) ellagic and gallic acids
Gene Family Experts
The following table lists the gene family experts who have accepted our invitation to join the capstone analysis. We are continually updating the table as more people join 1KP. To avoid conflict we prefer to assign one expert to each gene family, but we do make exceptions if people indicate a willingness to collaborate on a particular gene family. Everyone is encouraged to publish their findings as they see fit and on their own timeline, but they should appreciate that once the capstone is published all of the sequences will be public.
BIOLOGICAL PROCESS OR GENE FAMILY | FIRSTNAME | SURNAME | AFFILIATION |
ABC transporters | Neal | Stewart | University of Tennessee |
ABC1 kinases and Clp proteases | Klaas | van Wijk | Cornell University |
AGP and GT2 genes | Tony | Bacic | University of Melbourne |
AGP and GT2 genes | Monika | Doblin | University of Melbourne |
ammonium and phosphate transporters | Pierre-Emmanuel | Courty | University of Basel |
AP2 domain proteins | Michael | Holdsworth | University of Nottingham |
auxin network and F-box genes | Markus | Geisler | University of Fribourg |
auxin network and F-box genes | Ivo | Grosse | Martin Luther Universität Halle-Wittenberg |
auxin network and F-box genes | Martin | Porsch | Martin Luther Universität Halle-Wittenberg |
auxin network and F-box genes | Marcel | Quint | Leibniz Institute of Plant Biochemistry |
BAHD | John | D'Auria | Texas Tech University |
bHLH and TCP genes | Victor | Albert | SUNY University at Buffalo |
bHLH and TCP genes | Lorenzo | Carretero-Paulet | SUNY University at Buffalo |
BR signaling pathway | Zhi-Yong | Wang | Carnegie Institution for Science |
chromatin methylation | Robert | Schmitz | University of Georgia |
chromatin methylation | Adam | Bewick | University of Georgia |
ciliome biology | Steven | Kelly | Oxford University |
ciliome biology | Jane | Langdale | Oxford University |
circadian clock genes | Ulf | Lagercrantz | Uppsala University |
cullin-RING ubiquitin protein ligases | Richard | Vierstra | University of Wisconsin |
cuticle biology and wax synthesis | Ljerka | Kunst | University of British Columbia |
cuticle biology and wax synthesis | Jocelyn | Rose | Cornell University |
defense peptides | Christian | Gruber | Medical University of Vienna |
folate synthesis and other B vitamins | Andrew | Hanson | University of Florida |
glucosinolate biosynthesis | Barbara | Halkier | University of Copenhagen |
glycosyltransferase families GT47 and GT77Â | Jesper | Harholt | University of Copenhagen |
glycosyltransferase families GT47 and GT77Â | Peter | Ulvskov | University of Copenhagen |
glycosyltransferase family 1 and glycoside hydrolase family 28 | Luiz-Eduardo | Del-Bem | Harvard School of Public Health |
GSK3/Shaggy-like kinases and cell adhesion | Juliet | Coates | University of Birmingham |
GSK3/Shaggy-like kinases and cell adhesion | Younousse | Saidi | University of Birmingham |
HDZ3/ZPR | Pamela | Soltis | University of Florida |
histone deacetylases | Stéphane | Bourque | Université de Bourgogne |
isoprenyl diphosphate synthase | Feng | Chen | University of Tennessee |
kinases | Shin-Han | Shiu | Michigan State University |
leaf and fruit development | Barbara | Ambrose | New York Botanical Garden |
LysM RKs | Thorsten | Nürnberger | University of Tübingen |
MADS-box | Guenter | Theissen | Friedrich Schiller University of Jena |
mycorrhizal and rhizobial associations | Giles | Oldroyd | John Innes Centre |
nitric oxyde synthase | Sylvain | Jenadroz | Université de Bourgogne |
nitric oxyde synthase | David | Wendehenne | Université de Bourgogne |
P450 | David | Nelson | University of Tennessee |
P450 | Danièle | Werck-Reichhart | Institut de Biologie Moléculaire des Plantes |
peroxidase, class III | Christophe | Dunand | University Paul Sabatier (Toulouse III) |
phenylpropanoid | Clint | Chapple | Purdue University |
photosynthesis | Xinguang | Zhu | CAS-MPG Partner Institute for Computational Biology |
phytochrome | Sarah | Mathews | Arnold Arboretum (Harvard University) |
PP2C phosphatases | Christian | Doerig | Monash University |
PPR proteins | Patrick | Finnegan | University of Western Australia |
PPR proteins | Ian | Small | University of Western Australia |
PYR/PYL/RCAR ABA receptors and DNA demethylation pathway | Shaojun | Xie | Purdue University |
PYR/PYL/RCAR ABA receptors and DNA demethylation pathway | Jian-Kang | Zhu | Purdue University |
retinoblastoma-related (and isoprenoid synthesis) | Wilhelm | Gruissem | Eidgenössische Technische Hochschule Zürich |
SABBATH methyltransferases | Todd | Barkman | Western Michigan University |
secondary growth and wood formation | Andrew | Groover | US Forest Service at Davis |
sugar/sucrose transporters | Daniel | Wipf | Université de Bourgogne |
sulphate transporters | Leonardo | Casieri | Université de Bourgogne |
terpene synthase | Feng | Chen | University of Tennessee |
transcription factors | Stefan | Rensing | University of Marburg |
tubulin | Jack | Tuszynski | University of Alberta |
Â
Â