OneKP Capstone Wiki

The 1000 plants (oneKP or 1KP) initiative is an international multi-disciplinary consortium that has generated large-scale gene sequencing data for over 1000 species of plants. Major supporters include Alberta Ministry of Innovation and Advanced Education, Musea Ventures (Somekh Family Foundation), Beijing Genomics Institute in Shenzhen (BGI-Shenzhen), China National GeneBank (CNGB), iPlant Tree-of-Life (iPToL) Grand Challenge,Compute Canada (Westgrid), Alberta Innovates Technology Futures (AITF-iCORE Strategic Chair). The sample selection was originally based on a series of overlapping sub-projects with scientific objectives that could be addressed by sequencing multiple plant species -- descriptions of these sub-projects and the associated species list can be found at As more collaborators joined 1KP, the objectives evolved and are now exemplified by a diverse collection of papers. Here we describe the plans for a final capstone paper.

1000 Plants Data Set

We generated an average of 2Gb of RNA-seq data per sample on the Illumina sequencing platform (GA2 or HiSeq). For the most part, we sequenced one tissue sample per species, although exceptions were made when the scientific objectives required it. Paired-end data were assembled by SOAPdenovo-trans ( Each sample typically yielded 10k scaffolds with lengths of greater than 1kb. The results are released on password-protected repositories at Westgrid ( and TACC ( These repositories contain raw unassembled reads, assembled transcriptomes, and as an estimate of gene expression levels, averaged read depths computed across each scaffold.

One of the distinguishing characteristics of 1KP is the fact that we had no restrictions on the species that had to be sequenced. Although the majority of our sub-projects had applications-focused objectives, the majority of our samples were chosen to represent every species known to science, across the plant kingdom, at some phylogenetically or taxonomically defensible level. For example, we sequenced a representative of nearly all of the 415 known angiosperm families, and about a fifth of our sequenced species are algae. Most of these species have never been subjected to large-scale gene sequencing, and as a result, our cumulative efforts have now sequenced approximately 2 orders of magnitude more genes than the totality of the public databases.

Figure 1: Number of genes sequenced by 1KP as compared to the entirely of the NCBI databases as of March 2012. The branches of the phylogenetic tree are weighted by gene count. For artistic reasons, the scale is not uniform along the vertical axis. Black bars at the right indicate the approximate widths for one million genes. Notice that the 1KP counts were based on the initial SOAPdenovo assemblies, which contain roughly half as many genes as the new SOAPdenovo-trans assemblies that will be published.

Many bioinformatic analyses are being carried out for project-wide consumption. A phylogenomics pipeline ( has been developed and, in the process of computing the species tree, it will also provide the consortium with corrected reading frames, multiple sequence alignments, and gene family clusters. These results will be available for consortium use well before the species trees are computed. Ultimately, every gene tree will be reconciled against the species tree, to determine the timing of the gene duplications within each gene family relative to the speciation events. We are working with iPlant developers to provide web-based visualizations. The following describes the analyses being done.

Sorting transcripts into gene families (Naim Matasci): Finished; but automatically generated orthoMCL clusters are only approximations to gene families and some manual curation will be required.

Alignment and gene tree estimation and reconciliation with NCBI taxon tree (Tandy Warnow, Jim Leebens-Mack, and iPLANT): Scripts have been written and validated for gene families with <2000 genes. We can try pushing this to 10,000 but it will be difficult. Pipelines for doing the reconciliations and placing them into the viewer are in place. The fact that we will eventually replace the NCBI taxon tree should not be a bottleneck as it is mostly correct.

Reconciliation viewer (Naim Matasci): Software components are in place.The plan is to showcase the iPLANT website as part of our pilot publication on the phylogenomics of the first ~100 species.

Protein-protein interactions (Ling-Hong Hung): Based on homology to known interacting pairs,

Publications Strategy

All of the data will be released with the publication of a high-impact "capstone" paper that we will submit in 2015, along with multiple companion papers arising from the many 1KP sub-projects and the many analyses performed for the capstone. Even before the capstone is submitted, we expect to have published at least 3 methodology papers: on the RNA extractions, on the SOAPdenovo-trans assembler, and on the phylogenomics for a pilot data set of about 100 species. Many papers are already being published in advance of the capstone, because they require little of the data to be released prematurely, and their objectives are so different that early publication will not dilute the capstone's impact. All papers are tracked on the consortium wiki.

The capstone will emphasize the diversity of the species we chose. It will have 2 major components, starting first with a phylogenomics analysis of all 1000 species. A species tree will be computed from the subset of "low copy" genes, and on this tree, all of the gene families (low and high copy number, functionally characterized or not) will be attached and displayed within the iPlant framework. We expect in the process to resolve some important questions on single to multi-cellular evolution. To expand our readership beyond the systematics community, we will also perform an analysis of the gene changes associated with major evolutionary transitions across the plant kingdom. A core group of the major 1KP contributors has developed a working list (summarized below) of important evolutionary transitions and well-studied gene families.

Central to this plan is the recruitment of gene family experts to help interpret our results. We do not expect that all of the recruited experts will discover something truly novel, and neither is it necessary (or feasible) for us to analyze every gene family known to science. The idea is to cast a wide net, so that when we write the capstone, we can cherry-pick the most interesting and compelling results to incorporate into that paper. Given the inevitable page limits, it is unlikely that we will have enough space to discuss more than a few evolutionary transitions and gene families. Any unused analyses will go into the expected companion papers. All collaborators are encouraged to publish their own papers with controlling authorship and on their own timeline.

The capstone will be published under a consortium byline, "the 1000 plants (1KP) consortium", similar to the ENCODE project consortium on the September 6th 2012 cover of Nature. Everyone who contributed to the data collection or the data analysis will be included in the author list; individual contributions will be clearly stated. Although it is our intention and desire that people use this data, we cannot sabotage the capstone by allowing too much data to be released prematurely, or by allowing scientific conclusions destined for the capstone to be published in advance. In general, analyses of sufficient phylogenetic breadth run the risk of potential conflict with the capstone. We trust our collaborators to exercise the appropriate self-restraint, and also to appreciate that we want their best results to be incorporated into the capstone itself.

Additional papers, strong enough to justify a separate publication independent of the capstone, are most welcome. However the objectives must be sufficiently different (e.g. correlation of polyploidy and angiosperm diversification) to prevent the journal editors from asking us to merge everything together. Any paper that has a conceptual overlap with the capstone should be submitted to another journal and/or submitted after the capstone.

Selected Transitions

The green plant tree of life is marked by numerous innovations including the evolution of multi-cellularity, transitions from marine to freshwater and terrestrial environments, maternal retention of zygotes and embryos, the evolution of complex life histories including haploid and diploid phases, and the origin of vascular systems, the seed and the flower. These innovations define key transitions in the history of green plants and the origin of diverse groups of plants that form the foundation of local and our global biota. Having sampled transcriptomes across the green tree of life, 1KP is in a unprecedented position to assess changes in gene content or gene expression associated with each of these key transitions. Our phylogenomics pipeline ( is placing assembled transcripts into gene family alignments and gene trees. All 1KP consortium members will be given access to these analyses. To assist consortium members who are unfamiliar with the sequenced plant species and their evolutionary history, species/taxon consultants have been assigned to each transition. There are other important transitions (e.g., stomatophytes between embryophytes and tracheophytes) that we did not list. In those instances, the relevant expert(s) would be the ones for the next largest group.

Origin of Viridiplantae (Green Plants) 

Species/Taxon Consultant: Michael Melkonian, E-mail

Characteristic Innovations: origin of plastids with Chl a+b, intraplastidial starch synthesis, nuclear localization of the gene for RbcS, origin of whiplash flagella

Origin of Streptophyta

Species/Taxon Consultant: Michael Melkonian, E-mail

Characteristic Innovations: possible transition to freshwater, change in cell division, origin of phragmoplast and comparison to origin of phycoplast, flagella asymmetrically attached, origin of plasmodesmata

Origin of Embryophytes (Land Plants)

Species/Taxon Consultant: Michael Melkonian, E-mail; Sean Graham, E-mail; Dennis Stevenson, E-mail

Characteristic Innovations: cuticle, poikilohydry (desiccation tolerance/water control in somatic protoplasm), food-conducting cells, lignin and lignin-precursors (non-mechanical functions) -- any evidence of these in algae?

Origin of Tracheophytes (Vascular Plants)

Species/Taxon Consultant: Sean Graham, E-mail; Dennis Stevenson, E-mail

Characteristic Innovations: homoiohydry (stable water supply to tissues) including less desiccation control and origin of 'true' vascular tissue -- xylem with lignified tracheids, specialized food-conducting apparatus -- phloem tissue; autonomous sporophytes, reduced thalloid gametophytes

Origin of Euphyllophytes

Species/Taxon Consultant: Sean Graham, E-mail; Dennis Stevenson, E-mail

Characteristic Innovations: overtopping growth form

Origin of Spermatophytes (Seed Plants)

Species/Taxon Consultant: Sean Graham, E-mail; Dennis Stevenson, E-mail

Characteristic Innovations: axillary shoot branching

Origin of Angiosperms (Flowering Plants)

Species/Taxon Consultant: Doug Soltis, E-mail; Pam Soltis, E-mail; Sean Graham, E-mail; Dennis Stevenson, E-mail

Characteristic Innovations: extremely reduced gametophytes

Diversification of Mesangiosperms (including Monocots, Eudicots, Magnoliids) 

Species/Taxon Consultant: Doug Soltis, E-mail; Pam Soltis, E-mail; Sean Graham, E-mail; Dennis Stevenson, E-mail; Jim Leebens-Mack E-mail

Characteristic Innovations: (origin of monocots) calcium oxalate raphides, no vessels in leaves, steroidal saponins, diffuse vascular bundles, (origin of core eudicots) ellagic and gallic acids

Gene Family Experts

The following table lists the gene family experts who have accepted our invitation to join the capstone analysis. We are continually updating the table as more people join 1KP. To avoid conflict we prefer to assign one expert to each gene family, but we do make exceptions if people indicate a willingness to collaborate on a particular gene family. Everyone is encouraged to publish their findings as they see fit and on their own timeline, but they should appreciate that once the capstone is published all of the sequences will be public.

ABC transportersNealStewartUniversity of Tennessee 
ABC1 kinases and Clp proteasesKlaasvan WijkCornell University
AGP and GT2 genesTonyBacic University of Melbourne
AGP and GT2 genesMonikaDoblin University of Melbourne
ammonium and phosphate transporters Pierre-EmmanuelCourtyUniversity of Basel
AP2 domain proteinsMichaelHoldsworthUniversity of Nottingham
auxin network and F-box genesMarkusGeislerUniversity of Fribourg
auxin network and F-box genesIvoGrosseMartin Luther Universität Halle-Wittenberg
auxin network and F-box genesMartinPorschMartin Luther Universität Halle-Wittenberg
auxin network and F-box genesMarcelQuintLeibniz Institute of Plant Biochemistry
BAHDJohnD'AuriaTexas Tech University
bHLH and TCP genes VictorAlbertSUNY University at Buffalo 
bHLH and TCP genes LorenzoCarretero-PauletSUNY University at Buffalo 
BR signaling pathwayZhi-Yong WangCarnegie Institution for Science
chromatin methylationRobertSchmitzUniversity of Georgia
chromatin methylationAdamBewickUniversity of Georgia
ciliome biologyStevenKellyOxford University
ciliome biologyJaneLangdaleOxford University
circadian clock genesUlfLagercrantzUppsala University
cullin-RING ubiquitin protein ligasesRichard VierstraUniversity of Wisconsin
cuticle biology and wax synthesisLjerkaKunstUniversity of British Columbia
cuticle biology and wax synthesisJocelynRoseCornell University
defense peptidesChristianGruberMedical University of Vienna
folate synthesis and other B vitaminsAndrewHansonUniversity of Florida
glucosinolate biosynthesisBarbaraHalkier University of Copenhagen
glycosyltransferase families GT47 and GT77 JesperHarholtUniversity of Copenhagen
glycosyltransferase families GT47 and GT77 PeterUlvskovUniversity of Copenhagen
glycosyltransferase family 1 and glycoside hydrolase family 28Luiz-EduardoDel-BemHarvard School of Public Health
GSK3/Shaggy-like kinases and cell adhesion JulietCoatesUniversity of Birmingham
GSK3/Shaggy-like kinases and cell adhesion YounousseSaidiUniversity of Birmingham
HDZ3/ZPRPamelaSoltisUniversity of Florida
histone deacetylasesStéphaneBourqueUniversité de Bourgogne
isoprenyl diphosphate synthaseFengChenUniversity of Tennessee
kinasesShin-HanShiuMichigan State University
leaf and fruit developmentBarbaraAmbroseNew York Botanical Garden
LysM RKsThorsten NürnbergerUniversity of Tübingen
MADS-boxGuenterTheissenFriedrich Schiller University of Jena
mycorrhizal and rhizobial associationsGilesOldroydJohn Innes Centre
nitric oxyde synthase SylvainJenadrozUniversité de Bourgogne
nitric oxyde synthase DavidWendehenne Université de Bourgogne
P450DavidNelsonUniversity of Tennessee
P450DanièleWerck-ReichhartInstitut de Biologie Moléculaire des Plantes
peroxidase, class IIIChristopheDunand University Paul Sabatier (Toulouse III)
phenylpropanoidClintChapplePurdue University
photosynthesisXinguangZhuCAS-MPG Partner Institute for Computational Biology
phytochromeSarahMathewsArnold Arboretum (Harvard University)
PP2C phosphatasesChristianDoerigMonash University
PPR proteinsPatrickFinneganUniversity of Western Australia
PPR proteinsIanSmallUniversity of Western Australia
PYR/PYL/RCAR ABA receptors and DNA demethylation pathwayShaojunXiePurdue University
PYR/PYL/RCAR ABA receptors and DNA demethylation pathwayJian-KangZhuPurdue University
retinoblastoma-related (and isoprenoid synthesis)Wilhelm GruissemEidgenössische Technische Hochschule Zürich
SABBATH methyltransferasesToddBarkmanWestern Michigan University
secondary growth and wood formationAndrewGrooverUS Forest Service at Davis
sugar/sucrose transportersDaniel WipfUniversité de Bourgogne
sulphate transportersLeonardoCasieriUniversité de Bourgogne
terpene synthaseFengChenUniversity of Tennessee
transcription factorsStefanRensingUniversity of Marburg
tubulinJackTuszynskiUniversity of Alberta