Erros sistemáticos na inferência de ortologia e seus efeitos nas análises evolutivas

Volume 24, Issue 2, 19 February 2021, 102110

Systematic errors in orthology inference and their effects on evolutionary analyses

Paschalis Natsidis 1 Paschalia Kapli 1 Philipp H. Schiffer 1, 2 Maximilian J.Telford 1, 3

Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Ecology, University College London, London WC1E 6BT, UK

Received 23 November 2020, Revised 3 January 2021, Accepted 21 January 2021, Available online 28 January 2021.

Published: February 19, 2021


• Presence of shared orthologs across species is used for evolutionary analyses

• We simulated realistic sets of orthologs with no gains or losses

• Errors predicting shared orthologs correlate with phylogenetic relationships

• Presence/absence datasets based on errors recapitulate findings from empirical data


The availability of complete sets of genes from many organisms makes it possible to identify genes unique to (or lost from) certain clades. This information is used to reconstruct phylogenetic trees; identify genes involved in the evolution of clade specific novelties; and for phylostratigraphy—identifying ages of genes in a given species. These investigations rely on accurately predicted orthologs. Here we use simulation to produce sets of orthologs that experience no gains or losses. We show that errors in identifying orthologs increase with higher rates of evolution. We use the predicted sets of orthologs, with errors, to reconstruct phylogenetic trees; to count gains and losses; and for phylostratigraphy. Our simulated data, containing information only from errors in orthology prediction, closely recapitulate findings from empirical data. We suggest published downstream analyses must be informed to a large extent by errors in orthology prediction that mimic expected patterns of gene evolution.