Gene frequency distributions reject a neutral model of genome evolution
Alexander E. Lobkovsky1, Yuri I. Wolf1 and Eugene V. Koonin1,*
- Author Affiliations
1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894
↵*Corresponding author: E-mail: firstname.lastname@example.org
Received October 3, 2012.
Revision received December 7, 2012.
Accepted January 4, 2013.
Evolution of prokaryotes involves extensive loss and gain of genes which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a “shell” of moderately common genes, and a “cloud” of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species over a broad range of evolutionary distances were fit to steady state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, e.g. the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.
gene frequency distribution steady genome model goodness of fit evolution mechanisms.
Published by Oxford University Press 2013.