Crossover fixation could maintain identical repeats

  • Concerted evolution describes the ability of two related genes to evolve together as though constituting a single locus.
  • Coincidental evolution (Coevolution) describes a situation in which two genes evolve together as a single unit.
  • Gene conversion is the alteration of one strand of a heteroduplex DNA to make it complementary with the other strand at any position(s) where there were mispaired bases.
  • Crossover fixation refers to a possible consequence of unequal crossing-over that allows a mutation in one member of a tandem cluster to spread through the whole cluster (or to be eliminated).
  • Unequal crossing-over changes the size of a cluster of tandem repeats.
  • Individual repeating units can be eliminated or can spread through the cluster. 

The same problem is encountered whenever a gene has been duplicated. How can selection be imposed to prevent the accumulation of deleterious mutations?
The duplication of a gene is likely to result in an immediate relaxation of the evolutionary pressure on its sequence. Now that there are two identical copies, a change in the sequence of either one will not deprive the organism of a functional protein, since the original amino acid sequence continues to be coded by the other copy. Then the selective pressure on the two genes is diffused, until one of them mutates sufficiently away from its original function to refocus all the selective pressure on the other.
Immediately following a gene duplication, changes might accumulate more rapidly in one of the copies, leading eventually to a new function (or to its disuse in the form of a pseudogene). If a new function develops, the gene then evolves at the same, slower rate characteristic of the original function. Probably this is the sort of mechanism responsible for the separation of functions between embryonic and adult globin genes.
Yet there are instances where duplicated genes retain the same function, coding for the identical or nearly identical proteins. Identical proteins are coded by the two human α-globin genes, and there is only a single amino acid difference between the two γ-globin proteins. How is selective pressure exerted to maintain their sequence identity?
The most obvious possibility is that the two genes do not actually have identical functions, but differ in some (undetected) property, such as time or place of expression. Another possibility is that the need for two copies is quantitative, because neither by itself produces a sufficient amount of protein.
In more extreme cases of repetition, however, it is impossible to avoid the conclusion that no single copy of the gene is essential. When there are many copies of a gene, the immediate effects of mutation in any one copy must be very slight. The consequences of an individual mutation are diluted by the large number of copies of the gene that retain the wild-type sequence. Many mutant copies could accumulate before a lethal effect is generated.
Lethality becomes quantitative, a conclusion reinforced by the observation that half of the units of the rDNA cluster of X. laevis or D. melanogaster can be deleted without ill effect. So how are these units prevented from gradually accumulating deleterious mutations? And what chance is there for the rare favorable mutation to display its advantages in the cluster?
The basic principle of models to explain the maintenance of identity among repeated copies is to suppose that nonallelic genes are not independently inherited, but must be continually regenerated from one of the copies of a preceding generation. In the simplest case of two identical genes, when a mutation occurs in one copy, either it is by chance eliminated (because the sequence of the other copy takes over), or it is spread to both duplicates (because the mutant copy becomes the dominant version). Spreading exposes a mutation to selection. The result is that the two genes evolve together as though only a single locus existed. This is called coincidental evolution or concerted evolution (occasionally coevolution). It can be applied to a pair of identical genes or (with further assumptions) to a cluster containing many genes.
One mechanism supposes that the sequences of the nonallelic genes are directly compared with one another and homogenized by enzymes that recognize any differences. This can be done by exchanging single strands between them, to form genes one of whose strands derives from one copy, one from the other copy. Any differences show as improperly paired bases, which attract attention from enzymes able to excise and replace a base, so that only A·T and G·C pairs survive. This type of event is called gene conversion and is associated with genetic recombination as described in 15 Recombination and repair.
We should be able to ascertain the scope of such events by comparing the sequences of duplicate genes. If they are subject to concerted evolution, we should not see the accumulation of silent site substitutions between them (because the homogenization process applies to these as well as to the replacement sites). We know that the extent of the maintenance mechanism need not extend beyond the gene itself, since there are cases of duplicate genes whose flanking sequences are entirely different. Indeed, we may see abrupt boundaries that mark the ends of the sequences that were homogenized.
We must remember that the existence of such mechanisms can invalidate the determination of the history of such genes via their divergence, because the divergence reflects only the time since the last homogenization/regeneration event, not the original duplication.
The crossover fixation model supposes that an entire cluster is subject to continual rearrangement by the mechanism of unequal crossing-over. Such events can explain the concerted evolution of multiple genes if unequal crossing-over causes all the copies to be regenerated physically from one copy.
Following the sort of event depicted in , for example, the chromosome carrying a triple locus could suffer deletion of one of the genes. Of the two remaining genes, 1½ represent the sequence of one of the original copies; only ½ of the sequence of the other original copy has survived. Any mutation in the first region now exists in both genes and is subject to selective pressure.
Tandem clustering provides frequent opportunities for "mispairing" of genes whose sequences are the same, but that lie in different positions in their clusters. By continually expanding and contracting the number of units via unequal crossing-over, it is possible for all the units in one cluster to be derived from rather a small proportion of those in an ancestral cluster. The variable lengths of the spacers are consistent with the idea that unequal crossing-over events take place in spacers that are internally mispaired. This can explain the homogeneity of the genes compared with the variability of the spacers. The genes are exposed to selection when individual repeating units are amplified within the cluster; but the spacers are irrelevant and can accumulate changes.
In a region of nonrepetitive DNA, recombination occurs between precisely matching points on the two homologous chromosomes, generating reciprocal recombinants. The basis for this precision is the ability of two duplex DNA sequences to align exactly. We know that unequal recombination can occur when there are multiple copies of genes whose exons are related, even though their flanking and intervening sequences may differ. This happens because of the mispairing between corresponding exons in nonallelic genes.
Imagine how much more frequently misalignment must occur in a tandem cluster of identical or nearly identical repeats. Except at the very ends of the cluster, the close relationship between successive repeats makes it impossible even to define the exactly corresponding repeats! This has two consequences: there is continual adjustment of the size of the cluster; and there is homogenization of the repeating unit.
Consider a sequence consisting of a repeating unit "ab" with ends "x" and "y." If we represent one chromosome in black and the other in color, the exact alignment between "allelic" sequences would be:

But probably any sequence ab in one chromosome could pair with any sequence ab in the other chromosome. In a misalignment such as:

the region of pairing is no less stable than in the perfectly aligned pair, although it is shorter. We do not know very much about how pairing is initiated prior to recombination, but very likely it starts between short corresponding regions and then spreads. If it starts within satellite DNA, it is more likely than not to involve repeating units that do not have exactly corresponding locations in their clusters.
Now suppose that a recombination event occurs within the unevenly paired region. The recombinants will have different numbers of repeating units. In one case, the cluster has become longer; in the other, it has become shorter,

where "× " indicates the site of the crossover.

If this type of event is common, clusters of tandem repeats will undergo continual expansion and contraction. This can cause a particular repeating unit to spread through the cluster, as illustrated in Figure 4.18. Suppose that the cluster consists initially of a sequence abcde, where each letter represents a repeating unit. The different repeating units are closely enough related to one another to mispair for recombination. Then by a series of unequal recombination events, the size of the repetitive region increases or decreases, and also one unit spreads to replace all the others.

The crossover fixation model predicts that any sequence of DNA that is not under selective pressure will be taken over by a series of identical tandem repeats generated in this way (for review see Charlesworth, Sniegowski, and Stephan, 1994). The critical assumption is that the process of crossover fixation is fairly rapid relative to mutation, so that new mutations either are eliminated (their repeats are lost) or come to take over the entire cluster. In the case of the rDNA cluster, of course, a further factor is imposed by selection for an effective transcribed sequence.