Dataset collection
- Seed size data from Moles, Ackerly, et al. "A brief history of seed size " [can't find the original data online, though]
- C-values (genome size) from Kew
- GBIF locality info
- Lifemapper predicted locality and climate envelope estimates
- Indices of human interest in a species. For example, hook into a search engine API to find out how many web pages mention a particular species, or hook into NCBI to find out how much data is in GenBank for it. You could use this info in a couple of ways. For treeviz, when you collapse clades, you can use this to pick the most popular species as the representative name. For traits, you could reconstruct this interest on the tree. For example, maybe certain groups are relatively undersampled in genbank -- you could see this because this clade would be reconstructed red, low number of sequences, while the amborella clade would be bright blue. I think Mike Sanderson had something like this for NSF funding for plant groups, but I'm not sure.
- Merging trees from repositories and data from analysis repositories (e.g Interaction Web DB).
- From Ginger Jui:I'm a graduate student in David Ackerly's lab. In the attached files are trees and functional traits for:
and the Hawaiian silverswords.
The trees are newick strings and the traits are in .csv format. These datasets were used in David's 2009 PNAS paper, which contains the references for the trees and data.
Ackerly, D. 2009. Conservatism and diversification of plant functional traits: Evolutionary rates versus phylogenetic signal. PNAS 106:19699--19706.
These are moderately sized trees (10-30 taxa). I put these together because the references are all wrapped up in a single paper, but I can also scrounge up some bigger trees if needed.
- David D. Ackerly and Peter B. Reich, “Convergence and correlations among leaf size and function in seed plants: a comparative test using independent contrasts,” Am. J. Bot. 86, no. 9 (September 1, 1999): 1272-1281. Uses Contrast to remove the observed correlation leaf life span and lamina area.
50K Synthetic Data (from Brian)
50K Synthetic Tree (From Val Tannen)