Os limites da genética teórica das populações

sexta-feira, setembro 17, 2010

Genetics, Vol. 169, 1-7, January 2005, Copyright © 2005

The Limits of Theoretical Population Genetics

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138

1 Address for correspondence: Department of Organismic and Evolutionary Biology, 2102 Biological Laboratories, 16 Divinity Ave., Cambridge, MA 02138. 

THE purpose here is to discuss the limits of theoretical population genetics. This 100-year-old field now sits close to the heart of modern biology. Theoretical population genetics is the framework for studies of human history (REICH et al. 2002) and the foundation for association studies, which aim to map the genes that cause human disease (JORDE 1995). Arguably of more importance, theoretical population genetics underlies our knowledge of within-species variation across the globe and for all kinds of life. In light of its many incarnations and befitting its ties to evolutionarybiology, the limits of theoretical population genetics are recognized to be changing over time, with a number of new paths to follow. Stepping into this future, it will be important to develop newapproximations that reflect new data and not to let well-accepted models diminish the possibilities.

It is valuable to define this field narrowly. Theoretical population genetics is the mathematical study of the dynamics of genetic variation within species. Its main purpose is to understand the ways in which the forces of mutation, natural selection, random genetic drift, and population structure interact to produce and maintain the complex patterns of genetic variation that are readily observed among individuals within a species. A tremendous amount is known about the workings of organisms in their environments and about interactions among species. Ideally, with constantreference to these facts—the bulk of which are undoubtedly yet to be discovered—theoretical population genetics begins by distilling everything into a workable mathematical model of genetic transmission within a species.

Taking this narrow view precludes the application of theoretical population genetics to studies of long-term evolutionary phenomena. This, instead, is the purview of evolutionary theory. For theoretical population genetics, processes over longer time scales are of interest only insofar as they directly affect observable patterns of variation within species. The focus on current genetic variation came to the fore during the 1970s and 1980s with the development of coalescent theory (KINGMAN 19822000), or the mathematics of gene genealogies. EWENS (1990) reviews this transition from the forward-time approach of classical population genetics to the new, backward-time approach. It can be seen both in classical work (FISHER 1922WRIGHT 1931) and in coalescent theory (KINGMAN 1982HUDSON 1983TAJIMA 1983), both of which are considered below, that the time frame over which the models of theoretical population genetics apply within a given species is a small multiple of Ntotal generations, where Ntotal is the total population size, or the count of all the individuals of the species. Looking at gene genealogies in humans, for example, it seems that this means roughly from 104 to 106 years (HARRIS and HEY 1999).

This allows us to suppose that the parameters affecting the species that we wish to model have remained relatively constant over time, compared to the situation in evolutionary theory. For purposes of discussion, consider the following simple model which, with embellishments, might serve to describe any species from Homo sapiens to Bacillus subtilis. The species is divided into Dsubunits, each of size N, so that the total population size is Ntotal = ND. Corresponding to the phenomena listed above, the other parameters of the model are the per-locus, per-generationprobability of mutation u, the selective advantage or disadvantage, s, of some type relative to some other type in the population, and a parameter, m, which determines the extent of populationstructure.

The subunits in the model are used below to represent D diploid individuals, so that N = 2 is the number of copies of each chromosome within each individual. Note that this departs from the usual notation, in which N is the number of diploid individuals. The reason for this departure is to emphasize the similarities between the diploid model and other models of population structure.Thus, the same model is used to represent a population subdivided into D local populations, ordemes (GILMOUR and GREGOR 1939), each containing N individual organisms.

Many details have been ignored in this model for the sake of simplicity. For example, mutation is a complex process, which includes various kinds of recombination, and natural selection is similarly not likely to be so simple that a single parameter captures all of its intricacies. In addition, the general term "population structure" encompasses dioecy, ploidy level, age structure, reproductive patterns such as partial selfing, as well as the various forms of geographical structure and dispersal. Finally, as noted above, all parameters are assumed to not change over time. However, with some flexibility in the interpretations of parameters, this model can be used to illustrate the limits of theoretical population genetics.

The ranges of the parameters are restricted by nature. Specifically, D and N are whole numbers, both of which it is natural to assume are ≥1. The other parameters can vary continuously, but alsohave natural ranges: 0 ≤ u ≤ 1, s ≥ –1, and 0 ≤ m ≤ 1. The last two require some context. Let m be the fraction of each subunit (of which there are D) that is replaced by offspring randomly sampled from the entire population each generation. This is the island model of population subdivision and migration introduced by WRIGHT (1931), but it can be used to represent other forms of structure as well. Subdivision is at its least when m = 1 and is at its most when m = 0. Selection is imaginedbetween two types, one with fitness 1 and the other with fitness 1 + s, and s ≥ –1 precludes negative fitness values. With selection among more than two types, the fitness of one of them is taken to be equal to one and this establishes the relative selection coefficients (values of s) of the others.

The current and historical boundaries of theoretical population genetics can be understood with reference to the object of study, which is genetic variation within species, but also in terms of methodology. The ridiculously oversimplified model just described already has five parameters. Even with the restrictions above, there is an enormous five-dimensional space that defines allpossible kinds of species under the model: {(DNusm); D ≥ 1, N ≥ 1, 0 ≤ u ≤ 1, s ≥ –1, 0 ≤ m ≤1}. Theoretical population geneticists obtain predictive equations by simplifying such complicated models, again ideally with close attention to the biological relevance of any assumptions made. Formally, this is done by taking mathematical limits. The hope is that by doing so, i.e., by further restricting the ranges of parameters, tractable analytical results or simple approximations to the model can be obtained, which will be both useful and illuminating.

The first limiting result was established independently by HARDY (1908) and WEINBERG (1908) for the case of two alleles, A and a, with frequencies p and q = 1 – p, respectively, in a population of diploid, monoecious organisms; see CROW (1988) for a perspective on this important result. In this case, the subunits in the model represent the organisms (N = 2), the population is supposed to be infinite (D = {infty}), without mutation (u = 0) or selection (s = 0), and offspring are formed by either random mating or random union of gametes (m = 1). Then, the Hardy-Weinberg law states that the frequencies of the genotypes AAAa, and aa will be equal to p2, 2pq, and q2 after a single generation, regardless of the initial genotype frequencies, and that theywill remain in these frequencies forever. PROVINE (1971) discusses the important historical role of the Hardy-Weinberg law in evolutionary biology, which was to show that the mechanism of inheritance would not itself cause the variation upon which selection acts to be depleted in a population.

The simplicity of the Hardy-Weinberg law is a consequence of its very stringent assumptions. It exists only in the special case in which the values of all parameters are fixed and given by (D = {infty},N = 2, u = 0, s = 0, m = 1). FISHER (e.g., 1930) and HALDANE (e.g., 1932), and a great number of workers who followed their lead were content with the assumption of infinite population size. They sought to establish the dynamics of allele frequencies in an expanded Hardy-Weinberg population that included mutation and selection. As a result, much of classical population genetics takes place in the restricted parameter space where {(DN, usm); D = {infty}N = 2, 0 ≤ u ≤ 1, s ≥ –1, m = 1}. However, the overwhelming majority of results have been derived under the additional assumption that u and s are small.