KEY CONCEPTS:
- The rate of substitution per year at neutral sites is greater in the mouse than in the human genome.
The principle of the analysis is summarized in Figure 4.9. We start with a family of related sequences
that have evolved by duplication and substitution from an original family
member. We assume that the common ancestral sequence can be deduced by taking
the base that is most common at each position. Then we can calculate the
divergence of each individual family member as the proportion of bases that
differ from the deduced ancestral sequence. In this example, individual members
vary from 0.13 - 0.18 divergence, and the average is 0.16.
One family used for this analysis in the human and mouse
genomes derives from a sequence that is thought to have ceased to be active at
about the time of the divergence between Man and rodents (the LINES family; see
17.9 Retroposons fall into three
classes ). This means that it has been diverging without any selective
pressure for the same length of time in both species. Its average divergence in
Man is ~0.17 substitutions per site, corresponding to a rate of 2.2 × 10–9
substitutions per base per year over the 75 million years since the separation.
In the mouse genome, however, neutral substitutions have occurred at twice this
rate, corresponding to 0.34 substitutions per site in the family, or a rate of
4.5 × 10–9 .However, note that if we calculated the rate
per generation instead of per year, it would be greater in Man than in mouse
(~2.2 × 10–8 as opposed to ~10–9).
These figures probably underestimate the rate of
substitution in the mouse, because at the time of divergence the rates in both
species would have been the same, and the difference must have evolved since
then. The current rate of neutral substitution per year in the mouse is probably
2-3× greater than the historical average. These
rates reflect the balance between the occurrence of mutations and the ability of
the genetic system of the organism to correct them. The difference between the
species demonstrates that each species has systems that operate with a
characteristic efficiency.
Comparing the mouse and human genomes allows us to assess
whether syntenic (corresponding) sequences show signs of conservation or have
differed at the rate expected from accumulation of neutral substitutions. The
proportion of sites that show signs of selection is ~5%. This is much higher
than the proportion that codes for protein or RNA (~1%). It implies that the
genome includes many more stretches whose sequence is important for non-coding
functions than for coding functions. Known regulatory elements are likely to
comprise only a small part of this proportion. This number also suggests that
most (i.e., the rest) of the genome sequences do not have any function that
depends on the exact sequence.