- An exon is any segment of an interrupted gene that is represented in the mature RNA product.
- An intron (Intervening sequence) is a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it.
- A transcript is the RNA product produced by copying one strand of DNA. It may require processing to generate a mature RNA.
- RNA splicing is the process of excising the sequences in RNA that correspond to introns, so that the sequences corresponding to exons are connected into a continuous mRNA.
- A structural gene codes for any RNA or protein product other than a regulator.
- Eukaryotic genomes contain interrupted genes in which exons (represented in the final RNA product) alternate with introns (removed from the initial transcript).
- The exon sequences occur in the same order in the gene and in the RNA, but an interrupted gene is longer than its final RNA product because of the presence of the introns.
Until eukaryotic genes were characterized by molecular mapping, we assumed that they would have the same organization as prokaryotic genes. We expected the gene to consist of a length of DNA that is colinear with the protein. But a comparison between the structure of DNA and the corresponding mRNA shows a discrepancy in many cases. The mRNA always includes a nucleotide sequence that corresponds exactly with the protein product according to the rules of the genetic code. But the gene includes additional sequences that lie within the coding region, interrupting the sequence that represents the protein. (For a description of the discovery see Great Experiments: The discovery of RNA splicing and Great Experiments: The discovery of split genes and RNA splicing.)
The sequences of DNA comprising an interrupted gene are divided into the two categories depicted in Figure 2.1:
- The exons are the sequences represented in the mature RNA. By definition, a gene starts and ends with exons, corresponding to the 5 and 3 ends of the RNA.
- The introns are the intervening sequences that are removed when the primary transcript is processed to give the mature RNA.
The expression of interrupted genes requires an additional step that does not occur for uninterrupted genes. The DNA gives rise to an RNA copy (a transcript) that exactly represents the genome sequence. But this RNA is only a precursor; it cannot be used for producing protein. First the introns must be removed from the RNA to give a messenger RNA that consists only of the series of exons. This process is called RNA splicing. It involves a precise deletion of an intron from the primary transcript; the ends of the RNA on either side are joined to form a covalently intact molecule (see 24 RNA splicing and processing).
The structural gene comprises the region in the genome between points corresponding to the 5 and 3 terminal bases of mature mRNA. We know that transcription starts at the 5 end of the mRNA, but usually it extends beyond the 3 end, which is generated by cleavage of the RNA (see 24.19 The 3 ends of mRNAs are generated by cleavage and polyadenylation). The gene is considered to include the regulatory regions on both sides of the gene that are required for initiating and (sometimes) terminating gene expression.