1KP_Pilot_Study_DeepGreen

1KP_Pilot_Study_DeepGreen

1kp data analysis working group

In November 2009 a NESCent/iPlant-sponsored 1KP analysis workshop held in Phoenix to bring together members of the iPlant tree of life's (iPToL) Tree Reconciliation working group, the 1KP project and experts in phylogeny estimation using large, multi-gene data sets. The 1000 Plant Transcriptome Sequencing Initiative (www.onekp.com) aims to resolve relationships across the green plant phylogeny in order to elucidate processes contributing to diversification and biological innovations including origins in multi-cellularity, colonization of land, the evolution of vascular systems and the origins of seeds and flowers. In pursuit of these goals, the project is generating an unparalleled plant sequence database for investigating the evolution of gene families, regulatory networks and biosynthetic pathways. In collaboration with iPlant, the 1KP project plans to make all transcriptome sequences, gene trees, species trees and tree reconciliations available to the plant science community through an iPlant Discovery Environment.

As part of the iPlant Assembling the Tree of Life (iPToL) grand challenge project, the Tree Reconciliation Working Group and collaborators are developing pipelines for analysis of gene families within the context of organismal phylogenies. The working group will first analyze a pilot set of strategically placed streptophytic algae and land plant trascriptomes (see table) being generated by the 1KP project as a focal point for developing workflows aimed at circumscribing gene families, and estimating gene trees and species trees. Results will be shared with the larger community of iPToL and 1KP collaborators and prepared for publication.


Major systematics questions to be addressed in pilot project:

  • Which algal lineage(s) are sister to the land plants?

  • Do mosses, hornworts and liverworts form a clade?

  • To what degree is the timing of gene duplication events correlated across gene families?

    • Do we see diversification of gene families and/or functional groups associated with the origin of land plants, vascular plants, seed plants and/or flowering plants?

  • Are gene trees affected by sparse taxon sampling?

  • Given computational limitations (see below), is it feasible to go even further back in time to resolve the earliest branching events in history of green algae?


Some of the major computational questions include:

  • Given existing tools, how can we best infer species trees from reconciliation of unrooted and often poorly resolved (very deeply branching) gene trees?

  • What are the limits of the power of currently available gene tree/ species tree reconciliation methods?

  • How do we summarize the uncertainty in the gene tree reconciliation in a way that captures uncertainty in the species & gene tree topologies as well as the rooting of both?

  • How do we visualize the results of gene tree/ species tree reconciliations?

  • Where are the computational bottlenecks and how do we scale these analyses up?



Our experience with the pilot analysis will form the foundation for much larger analyses of the full 1000 transcriptome data set to be generated over the next 24 months. Most importantly, we will aim to address the computation challenges anticipated with analyses of the complete 1KP data set.

Table : Taxa to be included in pilot dataset from the 1KP project

 

Clade

Order

Family

Species

1KP Code

Comments

1

Basalmost angiosperms

Austrobaileyales

Illiciaceae/Schisan.

Amborella trichopoda

URDJ

Combine 1KP and AAGP data; update 1KP assembly

2

Basalmost angiosperms

Austrobaileyales

Illiciaceae/Schisan.

Nuphar advena

WTKZ

Combine 1KP and AAGP data

3

Basalmost angiosperms

Austrobaileyales

Illiciaceae/Schisan.

Kadsura heteroclite

NWMY

 

4

Magnoliid

Piperales

Piperaceae

Houttuynia cordata

CSSK

 

5

Magnoliid

Piperales

Aristolochiaceae

Saruma henryi

QDVW

Combine 1KP and FGP data

6

Magnoliid

Magnoliales

Magnoliaceae

Liriodendron tulipifera

-

AAGP data

7

Magnoliid

Magnoliales

Magnoliaceae

Persea americana

-

AAGP data

8

Chloranthales

Chloranthales

Chloranthaceae

Sarcandra glabra

OSHQ

 

9

Basal Eudicots

Ranunculales

Berberidaceae

Podophyllum pelatum

WFBF

 

10

Basal Eudicots

Ranunculales

Papaveraceae

Eschscholzia californica

multi-library

Combine 1KP and FGP data; multi-1kp-library assembly in progress

11

Basal Eudicots

Ranunculales

Ranunculaceae

Aquilegia formosa X pubescens

-

ESTs in GenBank

12

Core Eudicots

Caryophyllales

Amaranthaceae

Kochia scoparia

WGET

 

13

Core Eudicots/Rosids

Brassicales

Brassicaceae

Arabidopsis thaliana

-

Annotated Genome

14

Core Eudicots/Rosids

Brassicales

Brassicaceae

Arabidopsis lyrata

-

Annotated Genome

15

Core Eudicots/Rosids

Brassicales

Caricaceae

Carica papaya

-

Annotated Genome

16

Core Eudicots/Rosids

Malpighiales

Salicaceae

Populus trichocarpa

-

Annotated Genome

17

Core Eudicots/Rosids

Malpighiales

Euphorbiaceae

Ricinus communis

-

Annotated Genome

18

Core Eudicots/Rosids

Malpighiales

Euphorbiaceae

Manihot esculenta

-

Annotated Genome

19

Core Eudicots/Rosids

Fabales

Fabaceae

Medicago truncatula

-

Annotated Genome

20

Core Eudicots/Rosids

Fabales

Fabaceae

Glycine max

-

Annotated Genome

21

Core Eudicots/Rosids

Cucurbitales

Cucurbitaceae

Cucumis sativus

-

Annotated Genome

22

Core Eudicots/Rosids

Vitales

Vitaceae

Vitis vinifera

-

Annotated Genome

23

Core Eudicots/Rosids

Zygophyllales

Zygophyllaceae

Larrea divaricata

UDUT

 

24

Core Eudicots/Rosids

Rosales

Urticaceae

Boehmeria nivea

ACFP

 

25

Core Eudicots/Rosids

Malvales

Malvaceae

Hibiscus cannabinus

OLXF

 

26

Core Eudicots/Asterids

Gentianales

Apocynaceae

Allamanda cathartica

MGVU

awaiting assembly of top-off

27

Core Eudicots/Asterids

Gentianales

Apocynaceae

Catharanthus roseus

UOYN

 

28

Core Eudicots/Asterids

Lamiales

Lamiaceae

Rosmarinus officinalis

FDMM

 

29

Core Eudicots/Asterids

Lamiales

Phrymaceae

Mimulus guttatus

-

Annotated Genome

30

Core Eudicots/Asterids

Solanales

Convolvulaceae

Ipomoea purpurea

multi-library

multi-library assembly assembly in progress

31

Core Eudicots/Asterids

Ericales

Ebenaceae

Diospyros malabarica

KVFU

 

32

Core Eudicots/Asterids

Asterales

Asteraceae

Inula helenium

AFQQ

 

33

Core Eudicots/Asterids

Asterales

Asteraceae

Tanacetum parthenium

DUQG

 

34

Monocots

Acorales

Acoraceae

Acorus americanus

-

MonAtol

35

Monocots

Dioscoreales

Dioscoreaceae

Dioscorea villosa

OCWZ

 

36

Monocots

Liliales

Colchicaceae

Colchicum autumnale

NHIX

 

37

Monocots

Liliales

Smilacaceae

Smilax bona-nox

MWYQ

Sequencing in progress

38

Monocots

Asparagales

Asparagaceae

Yucca filamentosa

ICNN

 

39

Monocots/Commelinids

Arecales

Arecaceae

Chamaedorea seifrizii

-

MonAToL

40

Monocots/Commelinids

Poales

Poaceae

Zea Mays

-

Annotated Genome

41

Monocots/Commelinids

Poales

Poaceae

Sorghum bicolor

-

Annotated Genome

42

Monocots/Commelinids

Poales

Poaceae

Brachypodium distachyon