- The major evolutionary question is whether genes originated as sequences interrupted by exons or whether they were originally uninterrupted.
- Most protein-coding genes probably originated in an interrupted form, but interrupted genes that code for RNA may have originally been uninterrupted.
- A special class of introns is mobile and can insert itself into genes.
The highly interrupted structure of eukaryotic genes suggests a picture of the eukaryotic genome as a sea of introns (mostly but not exclusively unique in sequence), in which islands of exons (sometimes very short) are strung out in individual archipelagoes that constitute genes.
What was the original form of genes that today are interrupted?
- The "introns early" model supposes that introns have always been an integral part of the gene. Genes originated as interrupted structures, and those without introns have lost them in the course of evolution.
- The "introns late" model supposes that the ancestral protein-coding units consisted of uninterrupted sequences of DNA. Introns were subsequently inserted into them.
A test of the models is to ask whether the difference between eukaryotic and prokaryotic genes can be accounted for by the acquisition of introns in the eukaryotes or by the loss of introns from the prokaryotes.
The introns early model suggests that the mosaic structure of genes is a remnant of an ancient approach to the reconstruction of genes to make novel proteins. Suppose that an early cell had a number of separate protein-coding sequences. One aspect of its evolution is likely to have been the reorganization and juxtaposition of different polypeptide units to build up new proteins.
If the protein-coding unit must be a continuous series of codons, every such reconstruction would require a precise recombination of DNA to place the two protein-coding units in register, end to end in the same reading frame. Furthermore, if this combination is not successful, the cell has been damaged, because it has lost the original protein-coding units.
But if an approximate recombination of DNA could place the two protein-coding units within the same transcription unit, splicing patterns could be tried out at the level of RNA to combine the two proteins into a single polypeptide chain. And if these combinations are not successful, the original protein-coding units remain available for further trials. Such an approach essentially allows the cell to try out controlled deletions in RNA without suffering the damaging instability that could occur from applying this procedure to DNA. This argument is supported by the fact that we can find related exons in different genes, as though the gene had been assembled by mixing and matching exons (see 2.10 Some exons can be equated with protein functions).
Figure 2.21 illustrates the outcome when a random sequence that includes an exon is translocated to a new position in the genome. Exons are very small relative to introns, so it is likely that the exon will find itself within an intron. Because only the sequences at the exon-intron junctions are required for splicing, the exon is likely to be flanked by functional 3 and 5 splice junctions, respectively. Because splicing junctions are recognized in pairs, the 5 splicing junction of the original intron is likely to interact with the 3 splicing junction introduced by the new exon, instead of with its original partner. Similarly, the 5 splicing junction of the new exon will interact with the 3 splicing junction of the original intron. The result is to insert the new exon into the RNA product between the original two exons. So long as the new exon is in the same coding frame as the original exons, a new protein sequence will be produced. This type of event could have been responsible for generating new combinations of exons during evolution. Note that the principle of this type of event is mimicked by the technique of exon trapping that is used to screen for functional exons (see Figure 2.12).
Alternative forms of genes for rRNA and tRNA are sometimes found, with and without introns. In the case of the tRNAs, where all the molecules conform to the same general structure, it seems unlikely that evolution brought together the two regions of the gene. After all, the different regions are involved in the base pairing that gives significance to the structure. So here it must be that the introns were inserted into continuous genes.
Organelle genomes provide some striking connections between the prokaryotic and eukaryotic worlds. Because of many general similarities between mitochondria or chloroplasts and bacteria, it seems likely that the organelles originated by an endosymbiosis in which an early bacterial prototype was inserted into eukaryotic cytoplasm. Yet in contrast with the resemblances with bacteria—for example, as seen in protein or RNA synthesis—some organelle genes possess introns, and therefore resemble eukaryotic nuclear genes.
Introns are found in several chloroplast genes, including some that have homologies with genes of E. coli. This suggests that the endosymbiotic event occurred before introns were lost from the prokaryotic line. If a suitable gene can be found, it may therefore be possible to trace gene lineage back to the period when endosymbiosis occurred.
The mitochondrial genome presents a particularly striking case. The genes of yeast and mammalian mitochondria code for virtually identical mitochondrial proteins, in spite of a considerable difference in gene organization. Vertebrate mitochondrial genomes are very small, with an extremely compact organization of continuous genes, whereas yeast mitochondrial genomes are larger and have some complex interrupted genes. Which is the ancestral form? The yeast mitochondrial introns (and certain other introns) can have the property of mobility—they are self-contained sequences that can splice out of the RNA and insert DNA copies elsewhere—which suggests that they may have arisen by insertions into the genome (see 26.5 Some group I introns code for endonucleases that sponsor mobility and 26.6 Some group II introns code for reverse transcriptases).