The Genetic Code
Soon after the structure of DNA was proposed, Francis Crick turned his thoughts to the Genetic Code. At first he realised that any code that used only 2 bases at a time did not have enough information capacity to specify all of the amino acids found in proteins. He also though that a code that used 3 bases at a time had too much capacity.
In fact, the idea that there are 20 standard amino acids was not clear at that time. The search to unravel the Genetic Code, was partly instrumental in leading to that conclusion as well.
Crick and Sidney Brenner, along with their many colleagues, spent a lot of time thinking about the Code and how it might be interpreted. Once it was accepted that there was a standard repertoire of 20 amino acids, the triplet nature of the code followed.
What did not follow was how these triplets might be arranged. For a time, they considered an overlapping arrangement of codons (a word coined by Seymour Benzer) but they were able to dismiss this on the basis of protein sequence analysis.
Once they felt that the code was non-overlapping, the question became one of knowing where each triplet began. Proof that the code was indeed a triplet as well as the determination of the meaning of each triplet came from that old standby: experimentation.
[26-2]
Crick started a series of elegant genetic experiments using bacteriophage crosses which demonstrated very conclusively that the genetic code was a triplet code. At the same time, Marshall Nirenberg and Heinrich Matthaei showed that UUU was the codon for phenylalanine. The way appeared clear to solve the complete code. For this work, Nirenberg shared the 1968 Nobel Prize in Physiology or Medicine with Robert Holley (who solved the structure of yeast alanyl-tRNA- the first determination of the complete chemical structure of a biologically active nucleic acid) and with Har Gobind Khorana (whose methods for synthesising synthetic nucleic acids were a pre-requisite for the final solution of the genetic code).
Solving the Genetic Code
The first steps to solving the Genetic Code depended on the development of a cell-free in vitro translation system by Paul Zamecnik (right). This system which consisted of a membrane-free cell supernatent, ATP, GTP, radioactively labelled amino-acids and RNA, was capable of directing the synthesis of radioactively labelled protein.
[S5-15]
In 1961, Marshall Nirenberg and Heinrich Matthaei were using such a system to investigate the synthesis of viral proteins. They used the Tobacco Mosaic Virus (TMV) RNA as their experimental template. As a control RNA template they used the homopolymer poly(U) -- which they synthesized from UDP using polynucleotide phosphorylase. They did not expect that this template would code for or direct protein synthesis.
But it did! Nirenberg and Matthaei went on to show that the only amino acid that was incorporated into a polypeptide when poly(U) was the RNA template was phenylalanine. The way to crack code was open!
[Lod4-28]
above pictures from Nobel web site
Francis Crick (in What Mad Pursuit) describes how he heard about Nirenberg's results while on a visit to the Biochemical Congress in Moscow in 1961:
"The Moscow meeting was made especially interesting because of the results reported by Marshall Nirenberg, then almost unknown. I had heard rumours of these experiments but no details. Matt Meselson, whom I ran into in a corridor, alerted me to Marshall's talk in a remote seminar room. I was so impressed that I asked Marshall to take part in a much larger meeting, of which I was the chairman. What he had discovered was that he could add an artificial message to a test-tube system that synthesized proteins and get it to direct some synthesis. In detail, he had added poly U -- the RNA message consisting almost entirely of a sequence of uracils -- to the system and it had synthesized phenylalanine. This suggested that UUU (assuming a triplet code) was a codon for phenylalanine (one of the "magic twenty" amino acids), as indeed it is. I later claimed that the audience was "startled") I think I originally wrote "electrified") to receive this news. Seymour Benzer countered this with a photograph showing everyone looking extremely bored! Nevertheless it was an epoch-making discovery, after which there was no looking back."
The use of poly(A) and poly(C) as templates similarly showed that AAA was a codon for lysine and that CCC was a codon for proline. However, poly(G) did not work at all in the system.
This use of homopolymers is clearly quite limited. The use of random mixed copolymers helped to extend the utility of the system and the information obtained from it.
Random copolymers can be synthesized from a mixture of two ribonucleotides with polynucleotide phosphorylase. Thus if ADP and CDP are used in a 5:1 ratio, then the frequency of each possible triplet in the synthesized RNA will vary according to this ratio. For example, AAA triplets will be found 100 times more frequently than CCC triplets.
CODON FREQUENCY RELATIVE FREQUENCY AAA 0.579 100 AAU 0.116 20 AUA 0.116 20 UAA 0.116 20 AUU 0.023 4 UAU 0.023 4 UUA 0.023 4 UUU 0.00463 1
By measuring the ratios of the different amino acids that are incorporated into protein using random colpolymer templates, it is possible to narrow down the range of codons that correspond to particular amino acids.
This method did not yield all of the codon assignments. That required the chemical synthesis of short oligonucleotides with defined sequences. These were used in two ways:
Nirenberg and Phil Leder showed that aminoacylated tRNAs could be bound to ribosomes if the ribosomes contained trinucleotides acting as mRNA.
[Lod4-30] [S5-16]
Gobind Khorana showed that tri- and tetra-nucleotides could be polymerized into polymers with repeating sequences that could be used in cell-free in vitro translation assays.
In the case of trinucleotides, three polypeptides will be synthesized, each of which is a homopolymer of a single amino acid.
[MVH27-2] [Lod4-29]
In the case of tetranucleotides, a single polypeptide (usually) will be synthesized which contains a repeating amino acid sequence.above picture from Nobel web site
In these ways, the entire Genetic Code was determined.
The Genetic Code
U C A G UUU Phe
UUC Phe
UUA Leu
UUG LeuUCU Ser
UCC Ser
UCA Ser
UCG SerUAU Tyr
UAC Tyr
UAA Stop
UAG StopUGU Cys
UGC Cys
UGA Stop
UGG TrpCUU Leu
CUC Leu
CUA Leu
CUG LeuCCU Pro
CCC Pro
CCA Pro
CCG ProCAU His
CAC His
CAA Gln
CAG GlnCGU Arg
CGC Arg
CGA Arg
CGG ArgAUU Ile
AUC Ile
AUA Ile
AUG MetACU The
ACC Thr
ACA Thr
ACG ThrAAU Asn
AAC Asn
AAA Lys
AAG LysAGU Ser
AGC Ser
AGA Arg
AGG ArgGUU Val
GUC Val
GUA Val
GUG ValGCU Ala
GCC Ala
GCA Ala
GCG AlaGAU Asp
GAC Asp
GAA Glu
GAG GluGGU Gly
GGC Gly
GGA Gly
GGG GlyFor a simpler view of the this table go to http://esg-www.mit.edu:8001/esgbio/dogma/images/code.gif. [T26-1]
The following are features to note in the genetic code:
- The code is triplet, unpunctuated and nonoverlapping. Three bases are required to specify each amino acid. There are no gaps between codons. Codons do not overlap.
[MVH27-1]
- The code is degenerate. Most amino acids are specified by more than one codon. In fact, only Met and Trp are specified by a single codon:
# of codons amino acids 1 Met, Trp 2 Asn, Asp, Cys, Gln, Glu,
His, Lys, Phe, Tyr3 Ile 4 Ala, Gly, Pro, Thr, Val 6 Arg, Leu, Ser
Degeneracy is found only in the third nucleotide of the codon.
- The Genetic Code is Unambiguous.
In general, no codon specifies more than one amino acid. The exceptions so far are AUG, UGA and UAG. In the first case, AUG specifies both Methionine and N-formyl-Methionine, which is used to initiate protein synthesis in bacteria. In the second case, UGA specifies the twenty-first amino-acid selenocysteine as well as being a stop codon. And, in the last case, UAG specifies the twenty second amino acid (the most recent to be added to the list), pyrrolysine.
- There are 3 stop codons: UAA, UAG, and UGA.
- There is one start codon: AUG. However, note that GUG and UUG are occasionally found as start codons.
- The Genetic Code is Universal. Although there are a number of exceptions to this rule -- particularly in organelle systems -- the genetic code is remarkably the same in all organisms. The most common exception is the use of UGA as a codon for Tryptophan in mitochondria.
The following images show some of the known exceptions to universality in both the genetic code used in the nucleus and in the genetic code used in mitochondria.
NUCLEUS MITOCHONDRION Click on the images above to see each of the tables
|
RESOURCE MATERIAL |
|
| VOET, VOET & PRATT |
|
| STRYER |
|
| LEHNINGER |
|
| TAMARIN |
|
| WEB SITES |
|
| OTHER READING |
|
|
|