Molecular Biology: Pseudogenes are dead ends of evolution

KEY CONCEPTS:

Pseudogenes have no coding function, but they can be recognized by sequence similarities with existing functional genes. They arise by the accumulation of mutations in (formerly) functional genes.

Pseudogenes (Ψ) are defined by their possession of sequences that are related to those of the functional genes, but that cannot be translated into a functional protein.

Some pseudogenes have the same general structure as functional genes, with sequences corresponding to exons and introns in the usual locations. They may have been rendered inactive by mutations that prevent any or all of the stages of gene expression. The changes can take the form of abolishing the signals for initiating transcription, preventing splicing at the exon-intron junctions, or prematurely terminating translation.

Usually a pseudogene has several deleterious mutations. Presumably once it ceased to be active, there was no impediment to the accumulation of further mutations. Pseudogenes that represent inactive versions of currently active genes have been found in many systems, including globin, immunoglobulins, and histocompatibility antigens, where they are located in the vicinity of the gene cluster, often interspersed with the active genes.

A typical example is the rabbit pseudogene, Ψβ2, which has the usual organization of exons and introns, and is related most closely to the functional globin gene β1. But it is not functional. Figure 4.10 summarizes the many changes that have occurred in the pseudogene. The deletion of a base pair at codon 20 of Ψβ2 has caused a frameshift that would lead to termination shortly after. Several point mutations have changed later codons representing amino acids that are highly conserved in the β globins. Neither of the two introns any longer possesses recognizable boundaries with the exons, so probably the introns could not be spliced out even if the gene were transcribed. However, there are no transcripts corresponding to the gene, possibly because there have been changes in the 5 flanking region.

Since this list of defects includes mutations potentially preventing each stage of gene expression, we have no means of telling which event originally inactivated this gene. However, from the divergence between the pseudogene and the functional gene, we can estimate when the pseudogene originated and when its mutations started to accumulate.

If the pseudogene had become inactive as soon as it was generated by duplication from β1, we should expect both replacement site and silent site divergence rates to be the same. (They will be different only if the gene is translated to create selective pressure on the replacement sites.) But actually there are fewer replacement site substitutions than silent site substitutions. This suggests that at first (while the gene was expressed) there was selection against replacement site substitution. From the relative extents of substitution in the two types of site, we can calculate that Ψβ2 diverged from β1 ~55 million years ago, remained a functional gene for 22 million years, but has been a pseudogene for the last 33 million years.

Similar calculations can be made for other pseudogenes. Some appear to have been active for some time before becoming pseudogenes, but others appear to have been inactive from the very time of their original generation. The general point made by the structures of these pseudogenes is that each has evolved independently during the development of the globin gene cluster in each species. This reinforces the conclusion that the creation of new genes, followed by their acceptance as functional duplicates, variation to become new functional genes, or inactivation as pseudogenes, is a continuing process in the gene cluster. Most gene families have members that are pseudogenes. Usually the pseudogenes represent a small minority of the total gene number.

The mouse Ψα3 globin gene has an interesting property: it precisely lacks both introns. Its sequence can be aligned (allowing for accumulated mutations) with the α-globin mRNA. The apparent time of inactivation coincides with the original duplication, which suggests that the original inactivating event was associated with the loss of introns.

Inactive genomic sequences that resemble the RNA transcript are called processed pseudogenes. They originate by insertion at some random site of a product derived from the RNA, following a retrotransposition event, as discussed in Retroviruses and retroposons . Their characteristic features are summarized in Figure 17.19.

If pseudogenes are evolutionary dead ends, simply an unwanted accompaniment to the rearrangement of functional genes, why are they still present in the genome? Do they fulfill any function or are they entirely without purpose, in which case there should be no selective pressure for their retention?

We should remember that we see those genes that have survived in present populations. In past times, any number of other pseudogenes may have been eliminated. This elimination could occur by deletion of the sequence as a sudden event or by the accretion of mutations to the point where the pseudogene can no longer be recognized as a member of its original sequence family (probably the ultimate fate of any pseudogene that is not suddenly eliminated).

Even relics of evolution can be duplicated. In the β-globin genes of the goat, there are three adult species, βA , βB , and βC (see Figure 4.5). Each of these has a pseudogene a few kb upstream of it. The pseudogenes are better related to each other than to the adult β-globin genes; in particular, they share several inactivating mutations. Also, the adult β-globin genes are better related to each other than to the pseudogenes. This implies that an original Ψβ-β structure was itself duplicated, giving functional β genes (which diverged further) and two nonfunctional genes (which diverged into the current pseudogenes).

The mechanisms responsible for gene duplication, deletion, and rearrangement act on all sequences that are recognized as members of the cluster, whether or not they are functional. It is left to selection to discriminate among the products.

By definition, pseudogenes do not code for proteins, and usually they have no function at all, but in at least one exceptional case, a pseudogene has a regulatory function. Transcription of a pseudogene inhibits degradation of the mRNA produced by its homologous active gene (Hirotsune et al., 2003). Probably there is a protein responsible for this degradation that binds a specific sequence in the mRNA. If this sequence is also present in the RNA transcribed from the pseudogene, the effect of the protein will be diluted when the pseudogene is transcribed. It is not clear how common such effects may be, but as a general rule, we might expect dilution effects of this type to be possible whenever pseudogenes are transcribed.

October 14, 2012

Pseudogenes are dead ends of evolution