Molecular Biology: Sequence divergence is the basis for the evolutionary clock

KEY TERMS:

A neutral mutation has no significant effect on evolutionary fitness and usually has no effect on the phenotype.
Random drift describes the chance fluctuation (without selective pressure) of the levels of two alleles in a population.
Fixation is the process by which a new allele replaces the allele that was previously predominant in a population.
Divergence is the percent difference in nucleotide sequence between two related DNA sequences or in amino acid sequences between two proteins.
Replacement sites in a gene are those at which mutations alter the amino acid that is coded.
A silent site in a coding region is one where mutation does not change the sequence of the protein.
The evolutionary clock is defined by the rate at which mutations accumulate in a given gene.

KEY CONCEPTS:

The sequences of homologous genes in different species vary at replacement sites (where mutation causes amino acid substitutions) and silent sites (where mutation does not affect the protein sequence).
Mutations accumulate at silent sites ~10× faster than at replacement sites.
The evolutionary divergence between two proteins is measured by the per cent of positions at which the corresponding amino acids are different.
Mutations accumulate at a more or less even speed after genes separate, so that the divergence between any pair of globin sequences is proportional to the time since their genes separated.

Most changes in protein sequences occur by small mutations that accumulate slowly with time. Point mutations and small insertions and deletions occur by chance, probably with more or less equal probability in all regions of the genome, except for hotspots at which mutations occur much more frequently. Most mutations that change the amino acid sequence are deleterious and will be eliminated by natural selection.

Few mutations are advantageous, but when a rare one occurs, it is likely to spread through the population, eventually replacing the former sequence. When a new variant replaces the previous version of the gene, it is said to have become fixed in the population.

A contentious issue is what proportion of mutational changes in an amino acid sequence are neutral, that is, without any effect on the function of the protein, and able therefore to accrue as the result of random drift and fixation.

The rate at which mutational changes accumulate is a characteristic of each protein, presumably depending at least in part on its flexibility with regard to change. Within a species, a protein evolves by mutational substitution, followed by elimination or fixation within the single breeding pool. Remember that when we scrutinize the gene pool of a species, we see only the variants that have survived. When multiple variants are present, they may be stable (because neither has any selective advantage) or one may in fact be transient because it is in process of being displaced.

When a species separates into two new species, each now constitutes an independent pool for evolution. By comparing the corresponding proteins in two species, we see the differences that have accumulated between them since the time when their ancestors ceased to interbreed. Some proteins are highly conserved, showing little or no change from species to species. This indicates that almost any change is deleterious and therefore selected against.

The difference between two proteins is expressed as their divergence, the percent of positions at which the amino acids are different. The divergence between proteins can be different from the divergence between the corresponding nucleic acid sequences. The source of this difference is the representation of each amino acid in a three-base codon, in which often the third base has no effect on the meaning.

We may divide the nucleotide sequence of a coding region into potential replacement sites and silent sites:

At replacement sites, a mutation alters the amino acid that is coded. The effect of the mutation (deleterious, neutral, or advantageous) depends on the result of the amino acid replacement.
At silent sites, mutation only substitutes one synonym codon for another, so there is no change in the protein. Usually the replacement sites account for 75% of a coding sequence and the silent sites provide 25%.

In addition to the coding sequence, a gene contains nontranslated regions. Here again, mutations are potentially neutral, apart from their effects on either secondary structure or (usually rather short) regulatory signals.

Although silent mutations are neutral with regard to the protein, they could affect gene expression via the sequence change in RNA. For example, a change in secondary structure might influence transcription, processing, or translation. Another possibility is that a change in synonym codons calls for a different tRNA to respond, influencing the efficiency of translation.

The mutations in replacement sites should correspond with the amino acid divergence (determined by the percent of changes in the protein sequence). A nucleic acid divergence of 0.45% at replacement sites corresponds to an amino acid divergence of 1% (assuming that the average number of replacement sites per codon is 2.25). Actually, the measured divergence underestimates the differences that have occurred during evolution, because of the occurrence of multiple events at one codon. Usually a correction is made for this.

To take the example of the human β- and δ-globin chains, there are 10 differences in 146 residues, a divergence of 6.9%. The DNA sequence has 31 changes in 441 residues. However, these changes are distributed very differently in the replacement and silent sites. There are 11 changes in the 330 replacement sites, but 20 changes in only 111 silent sites. This gives (corrected) rates of divergence of 3.7% in the replacement sites and 32% in the silent sites, almost an order of magnitude in difference.

The striking difference in the divergence of replacement and silent sites demonstrates the existence of much greater constraints on nucleotide positions that influence protein constitution relative to those that do not. So probably very few of the amino acid changes are neutral.

Suppose we take the rate of mutation at silent sites to indicate the underlying rate of mutational fixation (this assumes that there is no selection at all at the silent sites). Then over the period since the β and δ genes diverged, there should have been changes at 32% of the 330 replacement sites, a total of 105. All but 11 of them have been eliminated, which means that ~90% of the mutations did not survive.

The divergence between any pair of globin sequences is (more or less) proportional to the time since they separated. This provides an evolutionary clock that measures the accumulation of mutations at an apparently even rate during the evolution of a given protein.

The rate of divergence can be measured as the percent difference per million years, or as its reciprocal, the unit evolutionary period (UEP), the time in millions of years that it takes for 1% divergence to develop. Once the clock has been established by pairwise comparisons between species (remembering the practical difficulties in establishing the actual time of speciation), it can be applied to related genes within a species. From their divergence, we can calculate how much time has passed since the duplication that generated them.

By comparing the sequences of homologous genes in different species, the rate of divergence at both replacement and silent sites can be determined, as plotted in Figure 4.7.

In pairwise comparisons, there is an average divergence of 10% in the replacement sites of either the α- or β-globin genes of mammals that have been separated since the mammalian radiation occurred ~85 million years ago. This corresponds to a replacement divergence rate of 0.12% per million years.

The rate is steady when the comparison is extended to genes that diverged in the more distant past. For example, the average replacement divergence between corresponding mammalian and chicken globin genes is 23%. Relative to a separation ~270 million years ago, this gives a rate of 0.09% per million years.

Going further back, we can compare the α- with the β-globin genes within a species. They have been diverging since the individual gene types separated 500 million years ago (see Figure 4.6). They have an average replacement divergence of ~50%, which gives a rate of 0.1% per million years.

The summary of these data in Figure 4.7 shows that replacement divergence in the globin genes has an average rate of ~0.096% per million years (or a UEP of 10.4). Considering the uncertainties in estimating the times at which the species diverged, the results lend good support to the idea that there is a linear clock.

The data on silent site divergence are much less clear. In every case, it is evident that the silent site divergence is much greater than the replacement site divergence, by a factor that varies from 2 to 10. But the spread of silent site divergences in pairwise comparisons is too great to show whether a clock is applicable (so we must base temporal comparisons on the replacement sites).

From Figure 4.7, it is clear that the rate at silent sites is not linear with regard to time. If we assume that there must be zero divergence at zero years of separation, we see that the rate of silent site divergence is much greater for the first ~100 million years of separation. One interpretation is that a fraction of roughly half of the silent sites is rapidly (within 100 million years) saturated by mutations; this fraction behaves as neutral sites. The other fraction accumulates mutations more slowly, at a rate approximately the same as that of the replacement sites; this fraction identifies sites that are silent with regard to the protein, but that come under selective pressure for some other reason.

Now we can reverse the calculation of divergence rates to estimate the times since genes within a species have been apart. The difference between the human β and δ genes is 3.7% for replacement sites. At a UEP of 10.4, these genes must have diverged 10.4 × 3.7 = 40 million years ago—about the time of the separation of the lines leading to New World monkeys, Old World monkeys, great apes, and man. All of these higher primates have both β and δ genes, which suggests that the gene divergence commenced just before this point in evolution.

Proceeding further back, the divergence between the replacement sites of γ and ε genes is 10%, which corresponds to a time of separation ~100 million years ago. The separation between embryonic and fetal globin genes therefore may have just preceded or accompanied the mammalian radiation.

An evolutionary tree for the human globin genes is constructed in Figure 4.8. Features that evolved before the mammalian radiation—such as the separation of β/δ from γ—should be found in all mammals. Features that evolved afterward—such as the separation of β- and δ-globin genes—should be found in individual lines of mammals.

In each species, there have been comparatively recent changes in the structures of the clusters, since we see differences in gene number (one adult β-globin gene in man, two in mouse) or in type (most often concerning whether there are separate embryonic and fetal genes).

When sufficient data have been collected on the sequences of a particular gene, the arguments can be reversed, and comparisons between genes in different species can be used to assess taxonomic relationships.

October 14, 2012

Sequence divergence is the basis for the evolutionary clock