Synthesis of phylogeny and taxonomy into a comprehensive tree of life
Cody E. Hinchliff a,1, Stephen A. Smith a,1,2, James F. Allman b, J. Gordon Burleigh c, Ruchi Chaudhary c, Lyndon M. Coghill d, Keith A. Crandall e, Jiabin Deng c, Bryan T. Drew f, Romina Gazis g, Karl Gude h, David S. Hibbett g, Laura A. Katz i, H. Dail Laughinghous e, IVi, Emily Jane McTavish j, Peter E. Midford d, Christopher L. Owen c, Richard H. Ree d, Jonathan A. Rees k, Douglas E. Soltis c,l, Tiffani Williams m, and Karen A. Cranston k,2
a Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109;
b Interrobang Corporation, Wake Forest, NC 27587;
c Department of Biology, University of Florida, Gainesville, FL 32611;
d Field Museum of Natural History, Chicago, IL 60605;
e Computational Biology Institute, George Washington University, Ashburn, VA 20147;
f Department of Biology, University of Nebraska-Kearney, Kearney, NE 68849;
g Department of Biology, Clark University, Worcester, MA 01610;
h School of Journalism, Michigan State University, East Lansing, MI 48824;
i Biological Science, Clark Science Center, Smith College, Northampton, MA 01063;
j Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045;
k National Evolutionary Synthesis Center, Duke University, Durham, NC 27705;
l Florida Museum of Natural History, University of Florida, Gainesville, FL 32611;
m Computer Science and Engineering, Texas A&M University, College Station, TX 77843
Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved July 28, 2015 (received for review December 3, 2014)
Significance
Scientists have used gene sequences and morphological data to construct tens of thousands of evolutionary trees that describe the evolutionary history of animals, plants, and microbes. This study is the first, to our knowledge, to apply an efficient and automated process for assembling published trees into a complete tree of life. This tree and the underlying data are available to browse and download from the Internet, facilitating subsequent analyses that require evolutionary trees. The tree can be easily updated with newly published data. Our analysis of coverage not only reveals gaps in sampling and naming biodiversity but also further demonstrates that most published phylogenies are not available in digital formats that can be summarized into a tree of life.
Abstract
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
phylogeny taxonomy tree of life biodiversity synthesis
Footnotes
1C.E.H. and S.A.S. contributed equally to this work.
2To whom correspondence may be addressed. Email: karen.cranston{at}gmail.com or eebsmith{at}umich.edu.
Author contributions: C.E.H., S.A.S., J.G.B., R.C., K. A. Crandall, K.G., D.S.H., L.A.K., R.H.R., D.E.S., T.W., and K. A. Cranston designed research; C.E.H., S.A.S., J.G.B., R.C., L.M.C., K. A. Crandall, J.D., B.T.D., R.G., D.S.H., H.D.L., E.J.M., P.E.M., C.L.O., R.H.R., J.A.R., D.E.S., and K. A. Cranston performed research; C.E.H., S.A.S., J.F.A., J.G.B., R.C., L.M.C., E.J.M., P.E.M., R.H.R., and J.A.R. contributed new reagents/analytic tools; C.E.H., S.A.S., J.G.B., R.C., L.M.C., B.T.D., R.G., D.S.H., H.D.L., C.L.O., J.A.R., and D.E.S. analyzed data; C.E.H., S.A.S., J.F.A., J.G.B., R.C., K. A. Crandall, L.A.K., H.D.L., E.J.M., J.A.R., D.E.S., and K. A. Cranston wrote the paper; J.F.A. conducted user interface development; and K.G. provided graphic design.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The Open Tree of Life taxonomy, the synthetic tree, and processed inputs are available from the Dryad database, dx.doi.org/10.5061/dryad.8j60q.
Freely available online through the PNAS open access option.