The quantitative framework of quartet internode calculations lends itself wonderfully to the development of new ways to visualize information in a given dataset. This section highlights graphics from a few recent publications
Hwang et al. 2015 depicted QIRP across an entire tree by plotting the QIRP of a marker for each node simultaneously. This plot can be drawn by providing a tree, rate vector, and state space as in section 3. For this example, we will use a single tree from Near et al. 2014 along with site rates from the same study
library(ape) read.tree(system.file("extdata","polypterus_trees.phy",package="PhyInformR"))->tree as.matrix(rag1)->rate_vector tree[]->bichir_tree PlotTreeSI(bichir_tree,rate_vector,3)
Here the branch lengths (x axis, time) and the blue lines (y axis, QIRP) match up and we can see that a similar trend to the one at the end of the previous section: small recent internode have low QIRP and are predicted to be impacted by homoplasy or contain little information
This approach to visualization can be quite handy when comparing markers, so leave this graphics window open for now
Visualizing Phylogenetic Experimental Design
Say we are interested choosing a new marker to sequence for bichirs.
We can compare the predicted utility of rag1 to candidate markers using this plotting method. In this example we will compare rag1 with the first locus from Prum et al. 2015 We’ll start by isolating the locus
Now we can compare to the above plot as follows
In the above plot, we have added QIRP to the previous plot on the Y axis with time still on the X axis. Blue lines correspond to the QIRP values of the tree internodes for rag1 while red lines correspond to our candidate locus. This visualization reveals that the candidate marker is predicted to be of higher utility for resolving every node in the bichir tree. This sort of visualization can be a great heuristic for choosing probe sets or primers for cost and time effective sequencing of markers
This form of visualization also displays overall trends of markers over time, and can help disentangle sources of error in tree inference. However, the above method requires a fixed internode length. Prum et al. 2015 provided an alternative visualization that accommodates for uncertainty in internode length
Note that here we are using the files from Prum et al. (2015), which are available on Zenodo (DOI 10.5281) though you could also substitute site rates from your own markers here.
Download the above files and open prumetal_heatmap_rosetta.r that is hosted on the github repository with phyinformR
Navigate to the “Phylogenetic_Informativeness” part of the downloaded directory
Change the setwd() function at the top of the rosetta file to the directory path, save, then source
This file translates the rates into locus specific rate matrices for you to explore. For this tutorial we will sort some loci by length to ask whether size of the locus is associated with increased information content. We are going to visualize a random subset of loci for the sake of keeping the tutorial manageable, though feel free to explore the full data further!
length(L216)->leL216 length(L182)->leL182 length(L149)->leL149 length(L213)->leL213 length(L184)->leL184 length(L223)->leL223 length(L305)->leL305 length(L8)->leL8 length(L203)->leL203 length(L95)->leL95 length(L41)->leL41 length(L295)->leL295 length(L107)->leL107 length(L163)->leL163 length(L21)->leL21 length(L2)->leL2 length(L325)->leL325 length(L80)->leL80 length(L28)->leL28 c(leL216,leL182,leL149,leL213,leL184,leL223,leL305,leL8,leL203,leL95,leL41,leL295,leL107, leL163,leL21,leL2,leL325,leL80,leL28)->ll names(ll)<-c("leL216","leL182","leL149","leL213","leL184","leL223","leL305","leL8","leL203","leL95", "leL41","leL295","leL107","leL163","leL21","leL2","leL325","leL80","leL28") sort(ll)
The guide tree from Prum et al. 2015 used relative rates (root height=1), so we are going to look at resolving Neoaves with a crown at .30. We begin by calculating the signal for different internode lengths. Note that the loci were already formatted as matrices in the Prum et al. 2015 script.
space.maker(allrates,.30,3)->a1 space.maker(L213,.30,3)->a2 space.maker(L182,.30,3)->a3 space.maker(L203,.30,3)->a4 space.maker(L149,.30,3)->a5 space.maker(L2,.30,3)->a6 space.maker(L295,.30,3)->a7 space.maker(L325,.30,3)->a8 space.maker(L107,.30,3)->a9 space.maker(L41,.30,3)->a10 space.maker(L216,.30,3)->a11 space.maker(L28,.30,3)->a12 space.maker(L184,.30,3)->a13 space.maker(L8,.30,3)->a14 space.maker(L95,.30,3)->a15 space.maker(L80,.30,3)->a16 space.maker(L163,.30,3)->a17 space.maker(L21,.30,3)->a18 space.maker(L305,.30,3)->a19 space.maker(L223,.30,3)->a20
We now take this output and assemble our heatmaps.
rbind(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20)->demo as.matrix(demo)->demo2 row.names(demo2)<- c("allrates","L213","leL182","leL203","leL149","leL2","leL295","leL325","leL107","leL41","l eL216","leL28", "leL184","leL8","leL95","leL80","leL163","leL21","leL305","leL223") .30/20->by.this seq(by.this,0.30-0.0001,by=by.this)->lilts colnames(demo2)<-lilts heatmap.2(demo2, Colv=F,Rowv=F, scale='none')
This heatmap shows us the probability of correct resolution for increasingly small internodes. This map can be generated for other loci and the internode distances can be zoomed in on using the space.maker.narrow function.
Dornburg, A., M. Friedman, and T. J. Near. 2015. Phylogenetic analysis of molecular and morphological data highlights uncertainty in the relationships of fossil and living species of Elopomorpha (Actinopterygii: Teleostei). Molecular Phylogenetics and Evolution 89:205-218.
Dornburg, A., A. J. Moore, J. M. Beaulieu, R. I. Eytan, and T. J. Near. 2014. The impact of shifts in marine biodiversity hotspots on patterns of range evolution: evidence from the Holocentridae (squirrelfishes and soldierfishes). Evolution 69:146-161.
Graybeal, A. “The Phylogenetic Utility of Cytochrome B: Lessons from Bufonid Frogs” Molecular phylogenetics and evolution 1993;: 256–269.
Harmon LJ, Weir JT, Brock CD, Glor RE, Challenger W. GEIGER: investigating evolutionary radiations. Bioinformatics. 2008;24:129–31.
Hwang, J., Q. Zhao, Z. L. Yang, Z. Wang, and J. P. Townsend. 2015. Solving the ecological puzzle of mycorrhizal associations using data from annotated collections and environmental samples–an example of saddle fungi. Environmental microbiology reports 7:658-667.
Lanfear, R., B. Calcott, S. Y. W. Ho, and S. Guindon. 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29:1695-1701.
Lanfear, R., B. Calcott, D. Kainer, C. Mayer, and A. Stamatakis. 2014. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology 14:82.
McCallum E, Weston S. Parallel R. “O’Reilly Media, Inc.”; 2011.
Near, T. J., A. Dornburg, M. Tokita, D. Suzuki, M. C. Brandley, and M. Friedman. 2014. Boom and bust: ancient and recent diversification in bichirs (Polypteridae: Actinopterygii), a relictual lineage of ray-finned fishes. Evolution 68:1014-1026.
Neuwirth,E.RColorBrewer: ColorBrewer Palettes. 2014. R package version 1.1-2. https://CRAN.R-project.org/package=RColorBrewer
Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 2004;20:289–90.
Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818.
Prum, R. O., J. S. Berv, A. Dornburg, D. J. Field, J. P. Townsend, E. Moriarty Lemmon, and A. R. Lemmon. Accepted. A Comprehensive Phylogeny of Birds (Aves) using Targeted Next Generation DNA Sequencing. Nature.
Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 2011;3:217–23
Su, Z., Z. Wang, F. Lopez-Giraldez, and J. P. Townsend. 2014. The impact of incorporating molecular evolutionary model into predictions of phylogenetic signal and noise. Phylogenetics, Phylogenomics, and Systematics 2:11.
Su Z, Townsend JP. Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects. BMC Evol. Biol. 2015;15:86. Townsend, J. P. 2007. Profiling phylogenetic informativeness. Systematic Biology 56:222-231.
Townsend, J. P., Z. Su, and Y. I. Tekle. 2012. Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny. Systematic Biology 61:835-849.
Warnes,G.R., Bolker, B., Bonebakker, L., Gentleman, R., Liaw, W.H.A., Lumley, T., Maechler, M., Magnusson, A., Moeller, S., Schwartz, B., Venables B., gplots: Various R Programming Tools for Plotting Data. 2016. R package version 3.0.1. https://CRAN.R-project.org/package=gplots.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; 2016.
Yang, Z. Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. Molecular biology and evolution 2007; 24(8), pp.1639-1655.