This is a three part tutorial. Part 1 focuses on the generation of PI profiles
Part 2 focuses on advanced features, visualizations, and calculations of quartet resolution probabilities
Part 3 focuses on visualization
PhyInformR is easy to install. Simply install via CRAN or download the compressed R script from github and install it manually
##cran install install.packages("PhyInformR") library(PhyInformR)
To install from github
library(devtools) install_github("carolinafishes/PhyInformR") library(PhyInformR)
FOR WINDOWS USERS - devtools will not install depencies for certain versions of windows. This is being addressed and should be fixed in the next major release. If your install fails, please first install the dependencies through CRAN and then use devtools as above for the final install.
install.packages("doParallel") install.packages("phytools") install.packages("splines") install.packages("gplots") install.packages("RColorBrewer") install.packages("foreach") install.packages("iterators") install.packages("geiger") install.packages("doParallel") install.packages("gridExtra") install.packages("hexbin") install.packages("PBSmodelling") install.packages("ggplot2")
Once you load PhyInformR, set the number of cores at the start of your session to enable later parallel processing if desired
We will also be hosting more sample data through Zenodo archives and github to explore new features as we develop them, so check back often!
PhyInformR is built upon the efforts of several other R packages including:
phytools splines gplots RColorBrewer foreach iterators geiger doParallel gridExtra hexbin ggplot2 PBSmodelling
Several functions in PhyInformR use parallel processing. Enable this via
library(doParallel) #set the number of cores if you are working in parallel registerDoParallel(cores=8)
now set your working directory to save files
Townsend’s phylogenetic informativeness profiles are a visual tool that enables assessment of the predicted utility of a given sequence for phylogenetic inference across a timescale of interest. Use of this method requires two inputs: site rates and a guide tree
Site rates can be obtained through a variety of software applications such as hyphy, rate4site, or DNArates. The phydesign web interface2 makes quantifying site rates easy:
1) Navigate to http://phydesign.townsend.yale.edu/
2) Upload an alignment and ultrametric tree
3) Choose your program for estimating rates from a dropdown
4) Wait for the email that your results are ready
Once you have site rates, use the the “c” function in R to format them. You are ready to explore your data
mysiterates<-c(0.00034, 0.005678, 0.0,..., 0.008967)
For this walkthrough, we will be using the avian tree and site rates from Prum et al.3 that are distributed with PhyInformR
read.tree(system.file("extdata","Prumetal_timetree.phy",package="PhyInformR"))->tree as.matrix(prumetalrates)->rr informativeness.profile(rr,tree, codon="FALSE", values="off")
Easy! Now you can make phylogenetic informativeness profiles (Townsend 2007) that look like this To obtain PI profiles for each codon position, you can toggle codon=”TRUE” if you are in reading frame
If you would like phyinformR to output of branching times and PI values, simply switch the values=”on”
Let’s do something different and partition the data by site rates. First we will view the rates:
We can see a bit of a tail going out, lets see what happens when we partition the data by rates above and below (0.003). We’ll start by creating some partitions
By defining rate based breaks in our data, we can see the PI of “fast” versus “slow” sites
lower<-c(0,0.003) upper<-c(0.003000001,10) cbind(lower,upper)->breaks
phyinformR has a function allowing profiles to be broken along any point in the rate vector, to assess changes in phylogenetic informativeness associated with thresholding the dataset at that rate
multi.profile(rr,tree, breaks) Partition 1 represents the slower site rates. As expected, the decay in phylogenetic informativeness for partition 1 is much lower across the tree than for partition 2. Conversely, we can see the faster sites in part two are informative for recent divergences, yet exhibit a rapid decline in informative site patterns as we move to deeper portions of the tree.
The above examples serve to illustrate what phyinformR does, but this approach is not common practice. Instead, it is more common to work with character sets partitioned by loci you wish to evaluate. In this case, simply use the same approach as above to define your loci and use defined.multi.profile
In this example we will compare locus 1, that spans sites 1-1594 in the alignment and locus2, that spans sites 1595-2787.
Lower<-c(1,1594) Upper<-c(1595,2787) Breaks<-cbind(Lower,Upper) defined.multi.profile(rr,tree,Breaks, values="off")
In this example the two loci are very similar.
Using this logic we can break datasets into any size partition we wish to evaluate. Feel free to give this a whirl with other included trees on the github repo from one of our recent studies (Dornburg et al. 2015; Dornburg et al. 2014) to get comfortable. Now, how about visualizing signal or noise probabilities across a tree?