Lee Altenberg's Home Page | Curriculum Vitae | Papers | Evolved Graphics | Sand Dune Ripples | E-mail me

About this document ...

Evolutionary Computation Models
from Population Genetics:
Part 2: An Historical Toolbox

A Tutorial presented at the Congress on Evolutionary Computation, 2000

Lee Altenberg, Ph.D.
Information and Computer Sciences
University of Hawai`i at Manoa


The Framework for Evolutionary Algorithms:

= \epsffile{GRAPHICS/Framework2.eps}

(mutation, crossover, inversion, conversion, etc.) acting on the
of the search space (genotypes, strings, programs, designs, etc.), generate a
Transmission function,
the probability distribution of offspring types from any combination (one, two, etc., sometimes the whole population) of parent types.
The relationship
between the transmission function and the fitness function is the main determinant the performance of the evolutionary algorithm.

The Two Elements of Darwinian


is an individual's type,
is the frequency of type i in the population,
is the fitness coefficient of type i, and
$T_{i\, \!\leftarrow\!\, j}$
is the probability that type j produces offspring of type i (i.e. transformation--mutation, recombination, migration, permutation, etc.).
$T_{i\, \!\leftarrow\!\, j,k}$
is when there are two parents, types j and k.

A population geneticist encounters
evolutionary computation:

``Fitness proportionate selection'' sounds like

``Velocity proportionate speed''
``Temperature proportionate heat''
``Mass proportionate weight''
``Length proportionate distance''
In other words, in evolutionary biology,

`Fitness' is the measure of selection.

Objective function (phenotype) $\neq$ Fitness

A better term: `Proportional Fitness'

The selection operator can be expressed as:

\begin{displaymath}x_i\,' = w_i x_i / \overline{w}, \mbox{\ \ where \ \ }\overline{w}= \sum_{i=1}^n w_i x_i,

or in vector form:

\begin{displaymath}\overline{w}\, {\bf x}\,' = {\bf W}{\bf x}, \mbox{\ \ where \ \ } {\bf W}= \mbox{\bf diag}{\left[ \stackrel{}{w_i}\right]}.

Q. Isn't $\overline{w}\, {\bf x}\,' = {\bf W}{\bf x}$ only for `proportional selection'?

A. No: the coefficient wi needn't be constant:

In fact $\{w_i\}$ can be functions of:

  • the frequencies of types in the population,
  • location,
  • time,
  • other species, agents, etc.--coevolution,
  • or be a random variable.

Other biological components of Selection:
Fitness Components:
  • Viability (survival rate)
  • Fecundity (offspring number)
  • Fertility (offspring number of particular mating types)

When multi-parameter fitness coefficients may be required:
  • Survivorship curves -- age dependent survival
  • Fecundity curves -- age dependent reproduction
  • Gene $\times$ Environment interactions

A Classification of Selection:


Q: Suppose there is an evolutionary game being played between the types in a population. Will the evolutionarily stable strategies (ESS) be the same whether you use constant (`proportional') selection, or truncation selection?
A: No! Illustrates the hazard of calling
uij the `fitnesses'.

Models of Canonical Evolutionary Algorithms:
  • Lewontin, R. C. 1964. The interaction of selection and linkage: I. General considerations; heterotic models. Genetics 49: 49-67 (esp. pp. 55-56).
  • Slatkin, M. 1970. Selection and polygenic characters. Proceedings of the National Academy of Sciences U.S.A. 66: 87-93.


\begin{displaymath}x_i\,' = \sum_{j=1}^n T(i \, \!\leftarrow\!\, j) \; w_j \; x_j / \overline{w},

or in vector form:

\begin{displaymath}\overline{w}\, {\bf x}\,' = {\bf T}{\bf W}{\bf x},

xi = frequency of genotype xi in the population, and $x_i\,'= $ frequency in the next generation;
wi = fitness of genotype i ;
$T(i \, \!\leftarrow\!\, j)$ = probability that genetic operators produce offspring i from parent j; and
$\overline{w}= \sum_{i=1}^n w_i x_i$ = mean fitness of the population.


\begin{displaymath}x_i\,' = \sum_{j,k=1}^n T(i \, \!\leftarrow\!\, j, k) \,\frac{w_{jk}\; x_j \; x_k}{\overline{w}},

or in vector form:

\begin{displaymath}\overline{w}\, {\bf x}\,' = {\bf T}{\bf W}({\bf x}\otimes {\bf x})

(a quadratic operator-- ${\bf x}(t)$ not tractable in general), where
is the tensor product (see below);
is the fitness of the parental pair (i,j)
${\bf W}$
is the n2 by n2 diagonal matrix of coefficients wi1 i2;
$T(i \, \!\leftarrow\!\, j, k)$
= probability that genetic operators produce offspring i from parents j and k,
$\sum_{i=1}^nT(i \, \!\leftarrow\!\, j, k) = 1$;
${\bf T}$
in this case is the n by n2 matrix of transmission probabilities;
For haploid selection, wi1 i2 = wi1 wi2 or $\mbox{${\bf W}= {\bf W}_{\tiny\cal H} \otimes {\bf W}_{\tiny\cal H}$ }$, hence:

\begin{displaymath}\overline{w}\, {\bf x}\,' = {\bf T}({\bf W}_{\tiny\cal H}\; {\bf x}\; \otimes \;{\bf W}_{\tiny\cal H}\; {\bf x}).

Solution for generic mutation and constant selection:

\begin{displaymath}\overline{w}(t) \; {\bf x}(t) = ({\bf T}{\bf W})^t {\bf x}(0)

Q. What is the limit as $t \!\rightarrow\!\infty$?

A. For generic mutation and selection:
One attractor, not multiple peaks.

Result 1   Suppose a mutation operator is ergodic: i.e. repeated application of the operator can mutate any genotype into any other genotype. Then, under an algorithm of constant selection and mutation, there is only one domain of attraction of the system--i.e. one `fitness peak'.

Perron-Frobenius Theory

How is this result derived?

Theorem 1 (Perron 1907, Frobenius)   If a matrix ${\bf M}$, or some power ${\bf M}^k$, has all positive entries, then:
The eigenvalue of ${\bf M}$ with the largest magnitude, $\rho({\bf M})$ (the `spectral radius'), is always real and positive, and strictly larger in magnitude that all other eigenvalues of ${\bf M}$.
The eigenvector, ${\bf x}({\bf M})$, corresponding to eigenvalue $\rho({\bf M})$, is positive.

Theorem 2 (Frobenius)   If ${\bf M}$ is non-negative and irreducible (i.e. for each i, j there exists some power k such that $[{\bf M}^k]_{i,j} >0$), then:
The eigenvalue of ${\bf M}$ with the largest magnitude, $\rho({\bf M})$ (the `spectral radius'), is always real and positive.
The eigenvector, ${\bf x}({\bf M})$, corresponding to eigenvalue $\rho({\bf M})$, is positive.

Example: Deceptive 1-Max Problem

=5in \epsffile{GRAPHICS/DeceptiveFitnesses.eps}
There is only one attractor at each value
$\mu$, but an `error catastrophe' is evident for $\mu*\approx 0.5$.
Finite Populations
  • So why do evolutionary algorithms with ergodic mutation seem to get trapped on `local fitness peaks'? e.g. Kauffman's NK Landscapes?
  • Answer: finite population size causing genetic drift: small random samples have stochastic variation, and types with few copies tend to go extinct rather than grow exponentially.
Fisher's Result on the Survival of a Fitter Mutant
  • Fisher, R. A. 1930. The genetical theory of natural selection. Oxford: Clarendon Press.

Result 2   Suppose that,
  • in a large population of individuals that produce an average of 1 offspring each,
  • a new mutant appears that produces a average of 1 + s offspring,
  • with variance $\sigma^2$, and where s is small.
The probability that the fitter mutant will not eventually go extinct is $2s/\sigma^2$.

The Wright-Fisher Model of Finite Populations
  • Wright, S. 1931. Evolution in Mendelian populations. Genetics 16:97-159.
  • Fisher, R. A. 1930. The genetical theory of natural selection. Oxford: Clarendon Press.

How can an evolutionary algorithm with finite population size be modelled? Wright and Fisher had the same idea for modelling for finite populations.

Wright-Fisher Model

Suppose that each individual of the next generation be created by independent application of selection and genetic operators to the population.

be the population size;
be a vector of the number of individuals of each type i in the population;
$y_i(\mbox{\boldmath\(\nu\)\unboldmath })$
be the probability a new individual of type i is generated by selection and genetic operators acting on the population $\mbox{\boldmath\(\nu\)\unboldmath }$.

Then the chance $\phi(\nu_i\,')$ that there are $\nu_i\,'$ individuals of type i in next generation is:

\begin{displaymath}\phi(\nu_i\,') = {{N}\choose{\nu_i}} y_i^{\nu_i} (1-y_i)^{N-\nu_i}.

The Wright-Fisher model forms a Markov chain. Wright and Fisher analyzed many of the properties of this Markov system, including probabilities of fixation, time to fixation, and stationary distributions of allele frequencies.

The theory of Wright and Fisher is known in the Evolutionary Computation community as the `Nix and Vose model':

  • A. E. Nix and M. D. Vose, 1991, Modeling genetic algorithms with Markov chains, Annals of Mathematics and Artificial Intelligence 5: 79-88.

Karlin's Theorem on
Genetic Operator Intensity

  • Karlin, Samuel. 1982. Classifications of selection-migration structures and conditions for a protected polymorphism. In Hecht, M. K. and B. Wallace and G. T. Prance, Evolutionary Biology 14: 61--204.

Result 3   Consider an evolutionary algorithm consisting of
  • constant selection, and
  • asexual genetic operators.
The mean fitness of the population at an attractor is a decreasing function of the probability of the genetic operator acting.

How is this obtained? Let the asexual genetic operator be represented by the Markov matrix ${\bf M}$, and let $\mu$ is the probability of applying the operator. Then the transmission matrix for the algorithm is:

\begin{displaymath}{\bf T}= (1-\mu) {\bf I}+ \mu {\bf M},

and the recursion is:

\begin{displaymath}\overline{w}\; {\bf x}\,' = [(1-\mu) {\bf I}+ \mu {\bf M}] \;{\bf W}\;{\bf x}

Karlin's Theorem, continued.

Samuel Karlin developed a theorem to determine when alleles introduced into a multi-deme population would increase in number.

Theorem 3 (Karlin, 1982)    Let ${\bf T}(\mu) = (1-\mu){\bf I}+ \mu{\bf M}$, where ${\bf M}$ is an irreducible Markov matrix, and let ${\bf W}\neq {\bf I}$ be a strictly positive diagonal matrix. Then the spectral radius $\rho({\bf T}(\mu){\bf W})$ is strictly decreasing in $\mu$:

\begin{displaymath}\frac{d}{d\mu}\rho({\bf T}(\mu){\bf W}) < 0, \mbox{ for }0\leq\mu\leq 1.

For the global attractor, $\hat{{\bf x}}$, which is the leading eigenvector of ${\bf T}{\bf W}$, we have:

\begin{displaymath}[(1-\mu) {\bf I}+ \mu {\bf M}]\;{\bf W}\;\hat{{\bf x}}= {\bf ...
...\bf x}}\; \rho({\bf T}{\bf W}) = \hat{{\bf x}}\; \overline{w}.

Hence the mean fitness of the global attractor,

\begin{displaymath}\widehat{\overline{w}} = \rho(\, {\bf T}(\mu){\bf W}\,),

is a decreasing function of the operator probability $\mu$.

Karlin's Theorem illustrated with the
Deceptive Function

= \epsffile{GRAPHICS/SpectralRadius.eps}
Tensor (Kronecker) Products
  • Roux, C. Z. 1974. Hardy-Weinberg equilibria in random mating populations. Theoretical Population Biology 5: 393-416.
  • Feldman, M. W. and J. Krakauer. 1976. Genetic modification and modifier polymorphisms. pp. 547-583 in Population Genetics and Ecology, ed. S. Karlin and E. Nevo. Academic Press, New York.
  • Karlin, S. and U. Liberman. 1978. The two-locus multi-allele additive viability model. J. Mathematical Biology 5: 201-211.

Useful for:

  • Multiple Loci
  • Sexual Reproduction
  • Compound types: genotype $\otimes$ phenotype $\otimes$ deme
  • Compound operators: mutation $\otimes$ recombination $\otimes$ migration

Tensor Products, continued

\begin{displaymath}{\bf A}= {\left[ \stackrel{}{
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}


\begin{displaymath}{\bf B}= {\left[ \stackrel{}{
b_{11} & b_{...
b_{31} & b_{32} \\
b_{41} & b_{42}

Then the tensor product is:

\begin{eqnarray*}{\bf A}\otimes {\bf B}&=& {\left[ \stackrel{}{
...2} b_{42} & a_{23} b_{41} & a_{23} b_{42}

In general,

\begin{displaymath}{\bf A}\otimes {\bf B}= {\left[ \stackrel{}{a_{i_1 j_1} b_{i_...
..._1=1\ldots m_1\\ i_2=1\ldots n_2, j_2=1\ldots m_2 \end{array}}

Tensor Products for Compound States

Frequency Vector of Multiple Loci in Linkage Equilibrium, i.e. no allele associations between loci:

\begin{displaymath}{\bf x}= {\bf x}_1 \otimes {\bf x}_2 \otimes \ldots \otimes {\bf x}_L

Frequency Vector of Genotypes Randomly Spread over Multiple Demes

\begin{displaymath}{\bf p}= {\bf x}\otimes \mbox{\boldmath\(\nu\)\unboldmath },

where ${\bf x}$ is the vector of genotype frequencies and $\mbox{\boldmath\(\nu\)\unboldmath }$ is the vector of deme sizes (as fractions of the whole meta-population).

Tensor Products for Compound Operators

When multiple independent operators are acting on the types in the population, the compound operator can be represented by tensor products.

Mutation at Multiple Loci

Suppose locus k mutates with transition probabilities

\begin{displaymath}{\bf T}_k = (1-\mu_k){\bf I}+ \mu_k{\bf M}_k .

Then the mutation operator acting on the entire genotype of L loci is:

\begin{displaymath}{\bf T}_M = {\bf T}_1 \otimes {\bf T}_2 \otimes \ldots \otimes {\bf T}_L.

Mutation + Recombination

When both recombination and mutation act independently on the genotype, the transmission function can be represented as (R=recombination, M=mutation, D=migration):

\begin{displaymath}{\bf T}= {\bf T}_R({\bf T}_M \otimes {\bf T}_M)

Mutation + Recombination + Migration

\begin{displaymath}{\bf T}= {\bf T}_D \otimes {\bf T}_R({\bf T}_M \otimes {\bf T}_M)

Generalized Nonepistatic Fitnesses (Karlin and Liberman 1979)

\begin{displaymath}{\bf W}= \sum_{ {\bf\kappa}\in \{0,1\}^L} c({\bf\kappa}) {\bf...
...bf W}_2^{\kappa_2} \otimes \ldots \otimes {\bf W}_L^{\kappa_L}

Selection and Transformation Commute:

Does the order $\mbox{Mutation} \!\rightarrow\!\mbox{Selection}$ vs. $\mbox{Selection} \!\rightarrow\! \mbox{Mutation}$ produce different evolutionary outcomes?

Assume frequency-independent selection:

\begin{displaymath}{\bf x}\,' = {\bf T}{\bf W}{\bf x}/ \overline{w},

Clearly, ${\bf T}{\bf W}\neq {\bf W}{\bf T}$. However, make a change of variables: let ${\bf y}= {\bf W}{\bf x}$, and ${\bf y}\,' = {\bf W}{\bf x}\,'$. Then:

\begin{eqnarray*}{\bf y}\,' &=& {\bf W}{\bf x}\,' = {\bf W}{\bf T}{\bf W}{\bf x}/ \overline{w}\\
& = &{\bf W}{\bf T}{\bf y}/ \overline{w}.

So the operators ${\bf T}{\bf W}$ and ${\bf W}{\bf T}$ produce the same trajectories with mapping between variables ${\bf y}= {\bf W}{\bf x}$.


\begin{eqnarray*}{\bf x}\,' &=& {\bf T}({\bf W}{\bf x}\otimes {\bf W}{\bf x}) / ...
...\!\mbox{Random Mating} \!\rightarrow\! \mbox{Transformation} ),

Again let ${\bf y}= {\bf W}{\bf x}$, and ${\bf y}\,' = {\bf W}{\bf x}\,'$.

\begin{eqnarray*}{\bf y}\,' &=& {\bf W}{\bf x}\,' = {\bf W}{\bf T}({\bf W}{\bf x...
...ghtarrow\!\mbox{Transformation} \!\rightarrow\!\mbox{Selection})

Selection on both Parents and Offspring:

What is the effect of applying selection to the choice of parents, and then applying selection again to the offspring?

\begin{displaymath}\mbox{Selection} \!\rightarrow\!\mbox{Random Mating} \!\rightarrow\! \mbox{Transformation} \!\rightarrow\! \mbox{Selection}

\begin{displaymath}{\bf x}\,' = {\bf W}{\bf T}({\bf W}{\bf x}\otimes {\bf W}{\bf x}) / \overline{w}^2

Now, make a change of variables: let ${\bf y}= {\bf W}^{-1}{\bf x}$, and ${\bf y}\,' = {\bf W}^{-1}{\bf x}\,'$.

\begin{eqnarray*}{\bf y}\,' &=& {\bf W}^{-1}{\bf x}\,' = {\bf W}^{-1}{\bf W}{\bf...
...\bf T}({\bf W}^2{\bf y}\otimes{\bf W}^2{\bf y}) / \overline{w}^2


\begin{displaymath}\mbox{Random Mating} \!\rightarrow\! \mbox{Selection} \!\rightarrow\!\mbox{Transformation} \!\rightarrow\! \mbox{Selection}

is equivalent to:

\begin{displaymath}\mbox{Random Mating} \!\rightarrow\! \mbox{Selection}^2 \!\rightarrow\!\mbox{Transformation} .

Gieringer's Representation of `all possible'
Crossover operators

  • Geiringer, H. 1944. On the probability theory of linkage in Mendelian heredity. Ann. Math. Statist. 15: 25-57.
  • Karlin, Sam and Uri Liberman. 1978. Classification and comparisons of multilocus recombination distributions. Proc. Natl. Acad. Sci. 75: 6332-6336.

Geiringer (1944) developed a way to represent all possible recombination operators acting on two parents. This includes single point, multiple point, and uniform crossover. She considered which genes get transmitted together into an offspring.

Gieringer, continued
  • All the genes transmitted from one parent can be marked with a 0, and
  • all the genes transmitted from the other parent can be marked with a 1.
  • Hence a particular recombinant offspring can be represented as a binary string, e.g. (0,1,1,0,1,1,0,0,0).
  • Hence, the recombination operation corresponds to a mask

    \begin{displaymath}{\bf c}\in \{0, 1\}^L .

  • There are $2^L = \Vert \{0, 1\}^L \Vert$ possible such recombination operations.
  • What defines a crossover operator is the probability of performing a particular recombination operation.
  • Thus, a crossover operator can be defined as a probability distribution over the set of masks:

    \begin{displaymath}{\cal R}: \{0, 1\}^L \mapsto [0, 1]: \; \sum_{{\bf c}\in\{0, 1\}^L} {\cal R}({\bf c}) = 1

  • Since every recombination event can produce an offspring with the genes from the reversed parents, we have the symmetry constraint that:

    \begin{displaymath}{\cal R}(c_1, c_2, \ldots, c_L) = {\cal R}(1-c_1, 1-c_2, \ldots, 1-c_L)

Representing the Mult-locus, Multi-Allele Model with Nonepistatic Selection and Arbitrary Crossover

  • Karlin, Sam and Uri Liberman. 1979. Central equilibria in multilocus systems. I. Generalized nonepistatic selection regimes. Genetics 91: 777-798.

\begin{displaymath}\overline{w}{\bf x}\,' = \sum_{{\bf c}} {\cal R}({\bf c}) ({\...
...}_1^{1-c_1} \otimes \ldots \otimes {\bf W}_L^{1-c_L}) {\bf x},

where each ${\bf W}={\left[ \stackrel{}{w_{ij}}\right]}$ is a diploid fitness matrix.
  • Note: Haploidy is the special case where:

    \begin{displaymath}{\bf W}= {\bf w}{\bf w}^{{\rm T}}, \;\;{\bf w}= {\left[ \stackrel{}{\begin{array}{c}w_1\\ \vdots\\ w_L\end{array}}\right]}.

  • Analytical solutions for the existence and stability of polymorphisms have been found for the general non-epistatic model.
  • Generalized non-epistasis is a highly non-generic space of fitness functions.
  • Trajectories and equilibria of the general epistatic model are unsolved in closed form.

Takeover Times
  • Goldberg, D. and Deb. 1991. A comparative analysis of selection schemes used in genetic algorithms. In G. Rawlins, editor, Foundations of Genetic Algorithms, pages 69-93. Morgan Kaufmann, San Mateo, CA.
Goldberg and Deb use a `pseudo finite population' analysis:
  • Let i be the fittest type;
  • Calculate, in terms of n, the time for an infinite population to go from xi = 1/n to xi = 1-1/n under selection alone.

  • Problem: method does not take into account genetic drift in finite populations.

`Takeover Times' known in Population Genetics as Fixation Times


When only one variant remains in the populaton.
When more than one variant exists in the populaton.
Fixation Time:
Time until the population goes from polymorphism to fixation.
Conditional Fixation Time:
Time until fixation, given that the population fixes on specific genotype.
Takeover Time:
Time until the fittest type goes to fixation, given that the population fixes on it.

Literature on Fixation Times

  • Fisher, R. A. 1930. The genetical theory of natural selection. Oxford: Clarendon Press.
  • Wright, S. 1931. Evolution in Mendelian populations. Genetics 16:97-159.
  • Kimura, M. and T. Ohta. 1969. The average number of generations until fixation of a mutant gene in a finite population. Genetics 61: 763-771.
  • Carr, R. N., and R. F. Nassar. 1970. Effects of selection and drift on the dynamics of finite populations. II. Expected time to fixation or loss of an allele. Biometrics 26: 221-227.
  • Ohta, T., and M. Kimura. 1972. Fixation time of overdominant alleles influenced by random fluctuation of selection intensity. Genetical Research 20: 1-7.
  • Nei, M., and A. K. Roychoudhury. 1973. Probability of fixation and mean fixation time of an overdominant mutation. Genetics 74: 371-380.
  • Nassar, R. F., and R. D. Cook. 1974. Ultimate probability of fixation and time to fixation or loss of a gene under a variable fitness model. Theoretical and Applied Genetics 44: 247-254.
  • Ewens, W. J.. 1979. Mathematical Population Genetics. Springer-Verlag, Berlin.

Conditional Fixation Time
of a Fitter Allele
  • Kimura, M. and T. Ohta. 1969. The average number of generations until fixation of a mutant gene in a finite population. Genetics 61: 763-771.

\begin{eqnarray*}T(\frac{1}{2N}\!\rightarrow\!\, 1) =& &\\
...\int_0^{\frac{1}{2N}} (e^{2NSx}-1)(1-e^{-2NSx})\frac{dx}{x(1-x)}

About this document ...
Lee Altenberg