Molecular Biology: The genetic code is triplet

KEY TERMS:

The genetic code is the correspondence between triplets in DNA (or RNA) and amino acids in protein.
A codon is a triplet of nucleotides that represents an amino acid or a termination signal.
Frameshift mutations arise by deletions or insertions that are not a multiple of 3 base pairs and change the frame in which triplets are translated into protein. The term is inappropriate outside of coding sequences.
Acridines are mutagens that act on DNA to cause the insertion or deletion of a single base pair. They were useful in defining the triplet nature of the genetic code.
A suppressor is a second mutation that compensates for or alters the effects of a primary mutation.
A frameshift suppressor is an insertion or deletion of a base that restores the original reading frame in a gene that has had a base deletion or insertion.

KEY CONCEPTS:

The genetic code is read in triplet nucleotides called codons.
The triplets are nonoverlapping and are read from a fixed starting point.
Mutations that insert or delete individual bases cause a shift in the triplet sets after the site of mutation.
Combinations of mutations that together insert or delete 3 bases (or multiples of three) insert or delete amino acids but do not change the reading of the triplets beyond the last site of mutation.

Each gene represents a particular protein chain. The concept that each protein consists of a particular series of amino acids dates from Sanger's characterization of insulin in the 1950s. The discovery that a gene consists of DNA faces us with the issue of how a sequence of nucleotides in DNA represents a sequence of amino acids in protein.

A crucial feature of the general structure of DNA is that it is independent of the particular sequence of its component nucleotides. The sequence of nucleotides in DNA is important not because of its structure per se, but because it codes for the sequence of amino acids that constitutes the corresponding polypeptide. The relationship between a sequence of DNA and the sequence of the corresponding protein is called the genetic code.

The structure and/or enzymatic activity of each protein follows from its primary sequence of amino acids. By determining the sequence of amino acids in each protein, the gene is able to carry all the information needed to specify an active polypeptide chain. In this way, a single type of structure—the gene—is able to represent itself in innumerable polypeptide forms.

Together the various protein products of a cell undertake the catalytic and structural activities that are responsible for establishing its phenotype. Of course, in addition to sequences that code for proteins, DNA also contains certain sequences whose function is to be recognized by regulator molecules, usually proteins. Here the function of the DNA is determined by its sequence directly, not via any intermediary code. Both types of region, genes expressed as proteins and sequences recognized as such, constitute genetic information.

The genetic code is deciphered by a complex apparatus that interprets the nucleic acid sequence. This apparatus is essential if the information carried in DNA is to have meaning. In any given region, only one of the two strands of DNA codes for protein, so we write the genetic code as a sequence of bases (rather than base pairs).

The genetic code is read in groups of three nucleotides, each group representing one amino acid. Each trinucleotide sequence is called a codon. A gene includes a series of codons that is read sequentially from a starting point at one end to a termination point at the other end. Written in the conventional 5→3 direction, the nucleotide sequence of the DNA strand that codes for protein corresponds to the amino acid sequence of the protein written in the direction from N-terminus to C-terminus.

The genetic code is read in nonoverlapping triplets from a fixed starting point:

Nonoverlapping implies that each codon consists of three nucleotides and that successive codons are represented by successive trinucleotides.
The use of a fixed starting point means that assembly of a protein must start at one end and work to the other, so that different parts of the coding sequence cannot be read independently.

The nature of the code predicts that two types of mutations will have different effects. If a particular sequence is read sequentially, such as:

UUU AAA GGG CCC (codons)

aa1 aa2 aa3 aa4 (amino acids)

then a point mutation will affect only one amino acid. For example, the substitution of an A by some other base (X) causes aa2 to be replaced by aa5:

UUU AAX GGG CCC

aa1 aa5 aa3 aa4

because only the second codon has been changed.

But a mutation that inserts or deletes a single base will change the triplet sets for the entire subsequent sequence. A change of this sort is called a frameshift. An insertion might take the form:

UUU AAX AGG GCC C

aa1 aa5 aa6 aa7

Because the new sequence of triplets is completely different from the old one, the entire amino acid sequence of the protein is altered beyond the site of mutation. So the function of the protein is likely to be lost completely.

Frameshift mutations are induced by the acridines, compounds that bind to DNA and distort the structure of the double helix, causing additional bases to be incorporated or omitted during replication. Each mutagenic event sponsored by an acridine results in the addition or removal of a single base pair (for review see Roth, 1974).

If an acridine mutant is produced by, say, addition of a nucleotide, it should revert to wild type by deletion of the nucleotide. But reversion can also be caused by deletion of a different base, at a site close to the first. Combinations of such mutations provided revealing evidence about the nature of the genetic code.

Figure 1.33 illustrates the properties of frameshift mutations. An insertion or a deletion changes the entire protein sequence following the site of mutation. But the combination of an insertion and a deletion causes the code to be read incorrectly only between the two sites of mutation; correct reading resumes after the second site.

Genetic analysis of acridine mutations in the rII region of the phage T6 in 1961 showed that all the mutations could be classified into one of two sets, described as (+) and (–). Either type of mutation by itself causes a frameshift, the (+) type by virtue of a base addition, the (–) type by virtue of a base deletion. Double mutant combinations of the types (+ +) and (––) continue to show mutant behavior. But combinations of the types (+ –) or (– +) suppress one another, giving rise to a description in which one mutation is described as a supressor of the other. (In the context of this work, "suppressor" is used in an unusual sense, because the second mutation is in the same gene as the first.)

These results show that the genetic code must be read as a sequence that is fixed by the starting point, so additions or deletions compensate for each other, whereas double additions or double deletions remain mutant. But this does not reveal how many nucleotides make up each codon.

When triple mutants are constructed, only (+ + +) and (––– ) combinations show the wild phenotype, while other combinations remain mutant. If we take three additions or three deletions to correspond respectively to the addition or omission overall of a single amino acid, this implies that the code is read in triplets. An incorrect amino acid sequence is found between the two outside sites of mutation, and the sequence on either side remains wild type, as indicated in Figure 1.33 (Benzer and Champe, 1961; Crick et al., 1961).

Molecular Biology

October 13, 2012

The genetic code is triplet

1 comment: