PHYLIP_CONTRAST_Example
Purpose
The purpose of this example to to demonstrate how to use the CONTRAST program in PHYLIP to perform Phylogenetic Independent Contrasts analysis using continuous character data and pre-made trees. We use a real data set of 2 continuous characters amongst 49 mammal species, as well as a synthetic data set comprised of 50K-species tree and a data set for two continuous characters. This document also introduces file formats used as inputs for CONTRAST.
Prerequisites
- Access to a unix/linux shell
- PHYLIP installed
- Access to Mesquite
- Nexus data files
Preparing the data files
Input file formats for CONTRAST are described in the documentation. The examples used here have a single representative for each species. Within-species variation is also supported by CONTRAST but is not shown in these examples
PLease note also that using the deskop GUI for Mesquite is not scalable or suitable for inclusion in a DE, just for demonstrating formats
- The file PDAP.nex was provided by Brian Omeara. It is a NEXUS format file that contains character data. The file header indicates it was generated by Peter Midford.
[written Mon Nov 24 19:18:14 CST 2008 by Mesquite version 2.5 (build j77) at 88.99.124.24.cm.sunflower.com/24.124.99.88 (Peter Midford)]
- Converting the NEXUS file to input suitable for phylip involved a few steps:
1) Load PDAP.nex into Mequite. The character data are log values of body mass and home range (sample below).2) Export the character data File->Export->Tab delimited continuous data file->PDAP.txtBEGIN CHARACTERS; TITLE Body_mass_and_home_range; DIMENSIONS NCHAR=2; FORMAT DATATYPE = CONTINUOUS GAP = - MISSING = ?; CHARSTATELABELS 1 log_Body_Mass, 2 log_Home_Range ; MATRIX Ursus_maritimus 2.423245874 2.062957834 Ursus_arctos 2.400192489 1.918030337 Ursus_americanus 1.970346876 1.754348336 Nasua_narica 0.6434526765 0.0211892991 Procyon_lotor 0.84509804 0.0569048513 Mephitis_mephitis 0.3979400087 0.3979400087 Meles_meles 1.064457989 -0.060480747 Canis_lupus 1.547774705 2.307067951
3) Export the tree data as PHYIP (Newick) format: File->Export->Phylip (trees)->PDAP.tree.fel
4) Modify the files for PHYLIP using the the ad-hoc perl script below. Two things that need to be done are to make sure the taxon labels are exactly 10 characters (required by PHYLIP) and that they correspond exactly between the character data file (PDAP.txt) and the tree file (PDAP.tree.fel)my %seen; while (<>) { my ($taxon) = /(\S+)/; # pad or truncate label to make it exactly 10 characters my $label = (length $taxon) < 10 ? sprintf('%-10s',$taxon) : substr $taxon, 0, 10; # check for duplications caused by label truncation if ($seen{$label}++) { $label =~ s/\S$/1/; } s/$taxon\s+/$label/; # also change the label in the tree file `perl -i -pe 's/$taxon/$label/' PDAP.tree.fel`; print; }
Creating the PHYLIP infile (PDAP.fel):
$ perl fix_PDAP.pl >PDAP.fel
Running CONTRAST
- Create the command file. The series of commands below names the data and tree files and specifies the 'C' option to print out the contrast data.
Use the perl script run_phylip.pl to execute CONTRAST and save the results as the file PDAP.contrasts.txt
PDAP.fel PDAP.tree.fel C Y
$ ./run_phylip.pl contrast command.txt PDAP.contrasts.txt Done. Outfile saved as PDAP.contrasts.txt. Program output saved as 'contrast.out' $ more contrast.out Continuous character comparative analysis, version 3.69 Settings for this run: W Within-population variation in data? No, species values are means R Print out correlations and regressions? Yes C Print out contrasts? Yes M Analyze multiple trees? No 0 Terminal type (IBM PC, ANSI, none)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes Y to accept these or type the letter for one to change Output written to file "outfile" Done. $ head -20 PDAP.contrasts.txt Contrasts (columns are different characters) --------- -------- --- --------- ----------- 0.00001 0.00007 0.00015 0.00008 -0.00006 -0.00001 -1053.85746 724.82686 0.00001 -0.00008 0.00100 0.00115 0.00019 0.00029 0.00002 -0.00010 0.00012 0.00035 0.00016 0.00041 0.00007 0.00012 -0.00004 -0.00022 -0.00009 0.00025 0.00001 -0.00027 -0.00024 -0.00037 -0.00006 -0.00001
Testing CONTRAST with a synthetic 50K data set
Original data files
- The original data files, provided by Brian Omeara, are described here. They are synthetic tree and character data.
- 50K_final_continuous.nex is the character data are in NEXUS format
- 50K_final_newick.tre is the tree file in Newick (accepted by PHYLIP) format. The tree is ultrametric (a rooted additive tree where the terminal nodes are all equally distant from the root), binary (all nodes bifurcate) and has all positive branch lengths (some methods, such as NJ, allow negative branch lengths, which are not suitable for independent contrasts).
Converting to PHYLIP format
No modifications were made to the tree file. The character data were processed as follows:
- open 50K_final_continuous.nex in Mesquite
- export as simple text (File->Export->Tab delimited continuous data file->50K.continuous.txt)
- use ad hoc perl script to create PHYLIP file 50K.continuous.fel
print " 50000 2\n"; while (<>) { next unless /^taxon/; chomp; my ($taxon,$s1,$s2) = split; my $label = sprintf('%-10s',$taxon); print join("\t",$label,$s1,$s2),"\n"; }
Running and benchmarking CONTRAST
- It took CONTRAST ~20s to run the analysis on the 50K taxon data.
- Results are save as 50K.contrasts.txt
$ time ./run_phylip.pl contrast command.txt 50K.contrasts.txt Done. Outfile saved as 50K.contrasts.txt. Program output saved as 'contrast.out' real 0m19.535s user 0m18.975s sys 0m0.371s