Molecular Phylogenetics and Evolution
Volume 94, Part A, January 2016, Pages 447–462
Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics
Scott V. Edwards a, , , Zhenxiang Xi a, Axel Janke b, Brant C. Faircloth c, John E. McCormack d, Travis C. Glenn e, Bojian Zhong f, Shaoyuan Wu g, Emily Moriarty Lemmon h, Alan R. Lemmon i, Adam D. Leaché j, Liang Liu k, Charles C. Davis a
a Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
b Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, D-60325 Frankfurt am Main, Germany
c Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
d Moore Laboratory of Zoology, Occidental College, Los Angeles, CA 90041, USA
e Department of Environmental Health Science, University of Georgia, Athens, GA 30602, USA
f College of Life Sciences, Nanjing Normal University, Nanjing 210023, China
g Department of Biochemistry and Molecular Biology & Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
h Department of Biological Science, Florida State University, Tallahassee, FL 32306, USA
i Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA
j Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, WA 98195, USA
k Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
Available online 27 October 2015
Under a Creative Commons license
Schematic of the recombination process along lineages of a species tree
In recent articles published in Molecular Phylogenetics and Evolution, Mark Springer and John Gatesy (S&G) present numerous criticisms of recent implementations and testing of the multispecies coalescent (MSC) model in phylogenomics, popularly known as “species tree” methods. After pointing out errors in alignments and gene tree rooting in recent phylogenomic data sets, particularly in Song et al. (2012) on mammals and Xi et al. (2014) on plants, they suggest that these errors seriously compromise the conclusions of these studies. Additionally, S&G enumerate numerous perceived violated assumptions and deficiencies in the application of the MSC model in phylogenomics, such as its assumption of neutrality and in particular the use of transcriptomes, which are deemed inappropriate for the MSC because the constituent exons often subtend large regions of chromosomes within which recombination is substantial. We acknowledge these previously reported errors in recent phylogenomic data sets, but disapprove of S&G’s excessively combative and taunting tone. We show that these errors, as well as two nucleotide sorting methods used in the analysis of Amborella, have little impact on the conclusions of those papers. Moreover, several concepts introduced by S&G and an appeal to “first principles” of phylogenetics in an attempt to discredit MSC models are invalid and reveal numerous misunderstandings of the MSC. Contrary to the claims of S&G we show that recent computer simulations used to test the robustness of MSC models are not circular and do not unfairly favor MSC models over concatenation. In fact, although both concatenation and MSC models clearly perform well in regions of tree space with long branches and little incomplete lineage sorting (ILS), simulations reveal the erratic behavior of concatenation when subjected to data subsampling and its tendency to produce spuriously confident yet conflicting results in regions of parameter space where MSC models still perform well. S&G’s claims that MSC models explain little or none (0–15%) of the observed gene tree heterogeneity observed in a mammal data set and that MSC models assume ILS as the only source of gene tree variation are flawed. Overall many of their criticisms of MSC models are invalidated when concatenation is appropriately viewed as a special case of the MSC, which in turn is a special case of emerging network models in phylogenomics. We reiterate that there is enormous promise and value in recent implementations and tests of the MSC and look forward to its increased use and refinement in phylogenomics.
We thank MPE Editor Derek Wildman for allowing us to write this piece, Vadim Goremykin for valuable discussion, one anonymous reviewer and Ed Braun for constructive criticism on the manuscript, Joe Felsenstein, Monty Slatkin, and Michael Lynch for discussion of Fig. 1, and John Gatesy and Mark Springer for sharing data and details of their work. The writing of this comment was supported by US National Science Foundation (NSF) grants DEB-1120243 to CCD, DEB-1120516 to EML, DEB-1145978 to D. Rokyta, ARL, and EML, and EAR-1355343 to SVE; National Natural Science Foundation of China grant 31570219 and funding from the Priority Academic Program Development of Jiangsu Higher Education Institutions to BZ.
This paper was edited by the Associate Editor Derek Wildman.