October 14, 2012

The rate of neutral substitution can be measured from divergence of repeated sequences


KEY CONCEPTS:
  • The rate of substitution per year at neutral sites is greater in the mouse than in the human genome.
We can make the best estimate of the rate of substitution at neutral sites by examining sequences that do not code for protein. (We use the term neutral here rather than silent, because there is no coding potential). An informative comparison can be made by comparing the members of common repetitive family in the human and mouse genomes (Waterston et al., 2002).

The principle of the analysis is summarized in Figure 4.9. We start with a family of related sequences that have evolved by duplication and substitution from an original family member. We assume that the common ancestral sequence can be deduced by taking the base that is most common at each position. Then we can calculate the divergence of each individual family member as the proportion of bases that differ from the deduced ancestral sequence. In this example, individual members vary from 0.13 - 0.18 divergence, and the average is 0.16.
One family used for this analysis in the human and mouse genomes derives from a sequence that is thought to have ceased to be active at about the time of the divergence between Man and rodents (the LINES family; see 17.9 Retroposons fall into three classes ). This means that it has been diverging without any selective pressure for the same length of time in both species. Its average divergence in Man is ~0.17 substitutions per site, corresponding to a rate of 2.2 × 109 substitutions per base per year over the 75 million years since the separation. In the mouse genome, however, neutral substitutions have occurred at twice this rate, corresponding to 0.34 substitutions per site in the family, or a rate of 4.5 × 109 .However, note that if we calculated the rate per generation instead of per year, it would be greater in Man than in mouse (~2.2 × 108 as opposed to ~109).
These figures probably underestimate the rate of substitution in the mouse, because at the time of divergence the rates in both species would have been the same, and the difference must have evolved since then. The current rate of neutral substitution per year in the mouse is probably 2-3× greater than the historical average. These rates reflect the balance between the occurrence of mutations and the ability of the genetic system of the organism to correct them. The difference between the species demonstrates that each species has systems that operate with a characteristic efficiency.
Comparing the mouse and human genomes allows us to assess whether syntenic (corresponding) sequences show signs of conservation or have differed at the rate expected from accumulation of neutral substitutions. The proportion of sites that show signs of selection is ~5%. This is much higher than the proportion that codes for protein or RNA (~1%). It implies that the genome includes many more stretches whose sequence is important for non-coding functions than for coding functions. Known regulatory elements are likely to comprise only a small part of this proportion. This number also suggests that most (i.e., the rest) of the genome sequences do not have any function that depends on the exact sequence.