sexta-feira, setembro 17, 2010

The Limits of Theoretical Population Genetics

John Wakeley¹

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138

^¹ Address for correspondence: Department of Organismic and Evolutionary Biology, 2102 Biological Laboratories, 16 Divinity Ave., Cambridge, MA 02138.

E-mail: wakeley@fas.harvard.edu

THE purpose here is to discuss the limits of theoretical populationgenetics. This 100-year-old field now sits close to the heartof modern biology. Theoretical population genetics is the frameworkfor studies of human history (REICH et al. 2002) and the foundationfor association studies, which aim to map the genes that causehuman disease (JORDE 1995). Arguably of more importance, theoreticalpopulation genetics underlies our knowledge of within-speciesvariation across the globe and for all kinds of life. In lightof its many incarnations and befitting its ties to evolutionarybiology, the limits of theoretical population genetics are recognizedto be changing over time, with a number of new paths to follow.Stepping into this future, it will be important to develop newapproximations that reflect new data and not to let well-acceptedmodels diminish the possibilities.

It is valuable to define this field narrowly. Theoretical populationgenetics is the mathematical study of the dynamics of geneticvariation within species. Its main purpose is to understandthe ways in which the forces of mutation, natural selection,random genetic drift, and population structure interact to produceand maintain the complex patterns of genetic variation thatare readily observed among individuals within a species. A tremendousamount is known about the workings of organisms in their environmentsand about interactions among species. Ideally, with constantreference to these facts—the bulk of which are undoubtedlyyet to be discovered—theoretical population genetics beginsby distilling everything into a workable mathematical modelof genetic transmission within a species.

Taking this narrow view precludes the application of theoreticalpopulation genetics to studies of long-term evolutionary phenomena.This, instead, is the purview of evolutionary theory. For theoreticalpopulation genetics, processes over longer time scales are ofinterest only insofar as they directly affect observable patternsof variation within species. The focus on current genetic variationcame to the fore during the 1970s and 1980s with the developmentof coalescent theory (KINGMAN 1982, 2000), or the mathematicsof gene genealogies. EWENS (1990) reviews this transition fromthe forward-time approach of classical population genetics tothe new, backward-time approach. It can be seen both in classicalwork (FISHER 1922; WRIGHT 1931) and in coalescent theory (KINGMAN1982; HUDSON 1983; TAJIMA 1983), both of which are consideredbelow, that the time frame over which the models of theoreticalpopulation genetics apply within a given species is a smallmultiple of N_total generations, where N_total is the total populationsize, or the count of all the individuals of the species. Lookingat gene genealogies in humans, for example, it seems that thismeans roughly from 10⁴ to 10⁶ years (HARRIS and HEY 1999).

This allows us to suppose that the parameters affecting thespecies that we wish to model have remained relatively constantover time, compared to the situation in evolutionary theory.For purposes of discussion, consider the following simple modelwhich, with embellishments, might serve to describe any speciesfrom Homo sapiens to Bacillus subtilis. The species is dividedinto Dsubunits, each of size N, so that the total populationsize is N_total = ND. Corresponding to the phenomena listed above,the other parameters of the model are the per-locus, per-generationprobability of mutation u, the selective advantage or disadvantage,s, of some type relative to some other type in the population,and a parameter, m, which determines the extent of populationstructure.

The subunits in the model are used below to represent D diploidindividuals, so that N = 2 is the number of copies of each chromosomewithin each individual. Note that this departs from the usualnotation, in which N is the number of diploid individuals. Thereason for this departure is to emphasize the similarities betweenthe diploid model and other models of population structure.Thus, the same model is used to represent a population subdividedinto D local populations, ordemes (GILMOUR and GREGOR 1939),each containing N individual organisms.

Many details have been ignored in this model for the sake ofsimplicity. For example, mutation is a complex process, whichincludes various kinds of recombination, and natural selectionis similarly not likely to be so simple that a single parametercaptures all of its intricacies. In addition, the general term"population structure" encompasses dioecy, ploidy level, agestructure, reproductive patterns such as partial selfing, aswell as the various forms of geographical structure and dispersal.Finally, as noted above, all parameters are assumed to not changeover time. However, with some flexibility in the interpretationsof parameters, this model can be used to illustrate the limitsof theoretical population genetics.

The ranges of the parameters are restricted by nature. Specifically,D and N are whole numbers, both of which it is natural to assumeare

1. The other parameters can vary continuously, but alsohave natural ranges: 0

1, s

–1, and 0

1. Thelast two require some context. Let m be the fraction of eachsubunit (of which there are D) that is replaced by offspringrandomly sampled from the entire population each generation.This is the island model of population subdivision and migrationintroduced by WRIGHT (1931), but it can be used to representother forms of structure as well. Subdivision is at its leastwhen m = 1 and is at its most when m = 0. Selection is imaginedbetween two types, one with fitness 1 and the other with fitness1 + s, and s

–1 precludes negative fitness values. Withselection among more than two types, the fitness of one of themis taken to be equal to one and this establishes the relativeselection coefficients (values of s) of the others.

The current and historical boundaries of theoretical populationgenetics can be understood with reference to the object of study,which is genetic variation within species, but also in termsof methodology. The ridiculously oversimplified model just describedalready has five parameters. Even with the restrictions above,there is an enormous five-dimensional space that defines allpossible kinds of species under the model: {(D, N, u, s, m);D

1, N

1, 0

1, s

–1, 0

1}. Theoretical populationgeneticists obtain predictive equations by simplifying suchcomplicated models, again ideally with close attention to thebiological relevance of any assumptions made. Formally, thisis done by taking mathematical limits. The hope is that by doingso, i.e., by further restricting the ranges of parameters, tractableanalytical results or simple approximations to the model canbe obtained, which will be both useful and illuminating.

The first limiting result was established independently by HARDY(1908) and WEINBERG (1908) for the case of two alleles, A anda, with frequencies p and q = 1 – p, respectively, ina population of diploid, monoecious organisms; see CROW (1988)for a perspective on this important result. In this case, thesubunits in the model represent the organisms (N = 2), the populationis supposed to be infinite (D = {infty}

), without mutation (u = 0)or selection (s = 0), and offspring are formed by either randommating or random union of gametes (m = 1). Then, the Hardy-Weinberglaw states that the frequencies of the genotypes AA, Aa, andaa will be equal to p², 2pq, and q² after a single generation,regardless of the initial genotype frequencies, and that theywill remain in these frequencies forever. PROVINE (1971) discussesthe important historical role of the Hardy-Weinberg law in evolutionarybiology, which was to show that the mechanism of inheritancewould not itself cause the variation upon which selection actsto be depleted in a population.

The simplicity of the Hardy-Weinberg law is a consequence ofits very stringent assumptions. It exists only in the specialcase in which the values of all parameters are fixed and givenby (D = {infty}

,N = 2, u = 0, s = 0, m = 1). FISHER (e.g., 1930) andHALDANE (e.g., 1932), and a great number of workers who followedtheir lead were content with the assumption of infinite populationsize. They sought to establish the dynamics of allele frequenciesin an expanded Hardy-Weinberg population that included mutationand selection. As a result, much of classical population geneticstakes place in the restricted parameter space where {(D, N,u, s, m); D = {infty}

, N = 2, 0

1, s

–1, m = 1}. However,the overwhelming majority of results have been derived underthe additional assumption that u and s are small.

+++++

FREE PDF GRÁTIS

+++++

Vote neste blog para o prêmio TOPBLOG 2010.

Os limites da genética teórica das populações

sexta-feira, setembro 17, 2010

The Limits of Theoretical Population Genetics

John Wakeley¹

Pesquisar este blog

Ph. D. Comics

Novidades

Receber novidades

Arquivo do blog

Links

Vídeos

Seguidores

ClustrMap

SiteMeter

Os limites da genética teórica das populações

sexta-feira, setembro 17, 2010

The Limits of Theoretical Population Genetics

John Wakeley1

Pesquisar este blog

Ph. D. Comics

Novidades

Receber novidades

Arquivo do blog

Links

Vídeos

Seguidores

ClustrMap

SiteMeter

John Wakeley¹