50K_Synthetic_Data
From Brian Omeara:
Here's a sample 50K-taxon tree to play with for testing program capabilities.
- 50K_final_newick.tre is a newick-formatted tree (just parentheses, labels, branch lengths)
- 50K_final_tree.nex is a tree in nexus format
- 50K_final_continuous.nex has two continuous chars (should be correlated) but no tree
- 50K_final_discrete.nex has two binary characters.
The tree is ultrametric and should be binary and have no zero length branches. It was made by stitching together (with perl) 500 trees, each with 100 taxa, on a basal phylogeny with 500 tips (each of the component trees generated under a Yule model, but the combined tree can't be said to have evolved under a Yule model). The characters weren't simulated up the tree, but were created by simulating the states of taxonX starting from the states of taxonX-1, so there's some correlation introduced anyway (since taxa with neighboring numbers are often in the same 100 taxon clade). Should be good enough for trying to run the files through various pieces of software. FYI, Mesquite (<http://www.mesquiteproject.org>) can open the files (requires high memory version and some patience), so you can use that to view them and convert to other formats.
- I'm also attaching the R code to generate the component trees (function "bigcontinuoussaving" used – note that there's a lot of extraneous stuff (aborted attempts to introduce true character simulation) in the file) and the perl script to assemble them and create the data files.
- The stitching together approach was used after I observed that the time to completion of tree simulations as a function of number of taxa was increasing far faster than linearly and I knew that the final assembly with perl would be very fast.