The Optimal Design of the Genetic Code

The Optimal Design of the Genetic Code

Were there no example in the world of contrivance except that of the eye, it would be alone sufficient to support the conclusion which we draw from it, as to the necessity of an intelligent Creator.

–William Paley, Natural Theology

In his classic work, Natural Theology, William Paley surveyed a range of biological systems, highlighting their similarities to human-made designs. Paley noticed that human designs typically consist of various components that interact in a precise way to accomplish a purpose. According to Paley, human designs are contrivances—things produced with skill and cleverness—and they come about via the work of human agents. They come about by the work of intelligent designers. And because biological systems are contrivances, they, too, must come about via the work of a Creator.

For Paley, the pervasiveness of biological contrivances made the case for a Creator compelling. But he was especially struck by the vertebrate eye. For Paley, if the only example of a biological contrivance available to us was the eye, its sophisticated design and elegant complexity alone justify the “necessity of an intelligent creator” to explain its origin.

As a biochemist, I am impressed with the elegant designs of biochemical systems. The sophistication and ingenuity of these designs convinced me as a graduate student that life must stem from the work of a Mind. In my book The Cell’s Design, I follow in Paley’s footsteps by highlighting the eerie similarity between human designs and biochemical systems—a similarity I describe as an intelligent design pattern. Because biochemical systems conform to the intelligent design pattern, they must be the work of a Creator.

As with Paley, I view the pervasiveness of the intelligent design pattern in biochemical systems as critical to making the case for a Creator. Yet, in particular, I am struck by the design of a single biochemical system: namely, the genetic code. On the basis of the structure of the genetic code alone, I think one is justified to conclude that life stems from the work of a Divine Mind. The latest work by a team of German biochemists on the genetic code’s design convinces me all the more that the genetic code is the product of a Creator’s handiwork.1

To understand the significance of this study and the code’s elegant design, a short primer on molecular biology is in order. (For those who have a background in biology, just skip ahead to The Optimal Genetic Code.)

Proteins

The “workhorse” molecules of life, proteins take part in essentially every cellular and extracellular structure and activity. Proteins are chain-like molecules folded into precise three-dimensional structures. Often, the protein’s three-dimensional architecture determines the way it interacts with other proteins to form a functional complex.

Proteins form when the cellular machinery links together (in a head-to-tail fashion) smaller subunit molecules called amino acids. To a first approximation, the cell employs 20 different amino acids to make proteins. The amino acids that make up proteins possess a variety of chemical and physical properties.

Figure 1: The Amino Acids. Image credit: Shutterstock

Each specific amino acid sequence imparts the protein with a unique chemical and physical profile along the length of its chain. The chemical and physical profile determines how the protein folds and, therefore, its function. Because structure determines the function of a protein, the amino acid sequence is key to dictating the type of work a protein performs for the cell.

DNA

The cell’s machinery uses the information harbored in the DNA molecule to make proteins. Like these biomolecules, DNA consists of chain-like structures known as polynucleotides. Two polynucleotide chains align in an antiparallel fashion to form a DNA molecule. (The two strands are arranged parallel to one another with the starting point of one strand located next to the ending point of the other strand, and vice versa.) The paired polynucleotide chains twist around each other to form the well-known DNA double helix. The cell’s machinery forms polynucleotide chains by linking together four different subunit molecules called nucleotides. The four nucleotides used to build DNA chains are adenosine, guanosine, cytidine, and thymidine, familiarly known as A, G, C, and T, respectively.

Figure 2: The Structure of DNA. Image credit: Shutterstock

As noted, DNA stores the information necessary to make all the proteins used by the cell. The sequence of nucleotides in the DNA strands specifies the sequence of amino acids in protein chains. Scientists refer to the amino-acid-coding nucleotide sequence that is used to construct proteins along the DNA strand as a gene.

The Genetic Code

A one-to-one relationship cannot exist between the 4 different nucleotides of DNA and the 20 different amino acids used to assemble polypeptides. The cell addresses this mismatch by using a code comprised of groupings of three nucleotides to specify the 20 different amino acids.

The cell uses a set of rules to relate these nucleotide triplet sequences to the 20 amino acids making up proteins. Molecular biologists refer to this set of rules as the genetic code. The nucleotide triplets, or “codons” as they are called, represent the fundamental communication units of the genetic code, which is essentially universal among all living organisms.

Sixty-four codons make up the genetic code. Because the code only needs to encode 20 amino acids, some of the codons are redundant. That is, different codons code for the same amino acid. In fact, up to six different codons specify some amino acids. Others are specified by only one codon.

Interestingly, some codons, called stop codons or nonsense codons, code no amino acids. (For example, the codon UGA is a stop codon.) These codons always occur at the end of the gene, informing the cell where the protein chain ends.

Some coding triplets, called start codons, play a dual role in the genetic code. These codons not only encode amino acids, but also “tell” the cell where a protein chain begins. For example, the codon GUG encodes the amino acid valine and also specifies the starting point of the proteins.

Figure 3: The Genetic Code. Image credit: Shutterstock

The Optimal Genetic Code

Based on visual inspection of the genetic code, biochemists had long suspected that the coding assignments weren’t haphazard—a frozen accident. Instead it looked to them like a rationale undergirds the genetic code’s architecture. This intuition was confirmed in the early 1990s. As I describe in The Cell’s Design, at that time, scientists from the University of Bath (UK) and from Princeton University quantified the error-minimization capacity of the genetic code. Their initial work indicated that the naturally occurring genetic code withstands the potentially harmful effects of substitution mutations better than all but 0.02 percent (1 out of 5,000) of randomly generated genetic codes with codon assignments different from the universal genetic code.2

Subsequent analysis performed later that decade incorporated additional factors. For example, some types of substitution mutations (called transitions) occur more frequently in nature than others (called transversions). As a case in point, an A-to-G substitution occurs more frequently than does either an A-to-C or an A-to-T mutation. When researchers included this factor into their analysis, they discovered that the naturally occurring genetic code performed better than one million randomly generated genetic codes. In a separate study, they also found that the genetic code in nature resides near the global optimum for all possible genetic codes with respect to its error-minimization capacity.3

It could be argued that the genetic code’s error-minimization properties are more dramatic than these results indicate. When researchers calculated the error-minimization capacity of one million randomly generated genetic codes, they discovered that the error-minimization values formed a distribution where the naturally occurring genetic code’s capacity occurred outside the distribution. Researchers estimate the existence of 1018 (a quintillion) possible genetic codes possessing the same type and degree of redundancy as the universal genetic code. Nearly all of these codes fall within the error-minimization distribution. This finding means that of 1018 possible genetic codes, only a few have an error-minimization capacity that approaches the code found universally in nature.

Frameshift Mutations

Recently, researchers from Germany wondered if this same type of optimization applies to frameshift mutations. Biochemists have discovered that these mutations are much more devastating than substitution mutations. Frameshift mutations result when nucleotides are inserted into or deleted from the DNA sequence of the gene. If the number of inserted/deleted nucleotides is not divisible by three, the added or deleted nucleotides cause a shift in the gene’s reading frame—altering the codon groupings. Frameshift mutations change all the original codons to new codons at the site of the insertion/deletion and onward to the end of the gene.

Figure 4: Types of Mutations. Image credit: Shutterstock

The Genetic Code Is Optimized to Withstand Frameshift Mutations

Like the researchers from the University of Bath, the German team generated 1 million random genetic codes with the same type and degree of redundancy as the genetic code found in nature. They discovered that the code found in nature is better optimized to withstand errors that result from frameshift mutations (involving either the insertion or deletion of 1 or 2 nucleotides) than most of the random genetic codes they tested.

The Genetic Code Is Optimized to Harbor Multiple Overlapping Codes

The optimization doesn’t end there. In addition to the genetic code, genes harbor other overlapping codes that independently direct the binding of histone proteins and transcription factors to DNA and dictate processes like messenger RNA folding and splicing. In 2007, researchers from Israel discovered that the genetic code is also optimized to harbor overlapping codes.4

The Genetic Code and the Case for a Creator

In The Cell’s Design, I point out that common experience teaches us that codes come from minds. By analogy, the mere existence of the genetic code suggests that biochemical systems come from a Mind. This conclusion gains considerable support based on the exquisite optimization of the genetic code to withstand errors that arise from both substitution and frameshift mutations, along with its optimal capacity to harbor multiple overlapping codes.

The triple optimization of the genetic code arises from its redundancy and the specific codon assignments. Over 1018 possible genetic codes exist and any one of them could have been “selected” for the code in nature. Yet, the “chosen” code displays extreme optimization—a hallmark feature of designed systems. As the evidence continues to mount, it becomes more and more evident that the genetic code displays an eerie perfection.5

An elegant contrivance such as the genetic code—which resides at the heart of biochemical systems and defines the information content in the cell—is truly one in a million when it comes to reasons to believe.

Resources

Endnotes
  1. Regine Geyer and Amir Madany Mamlouk, “On the Efficiency of the Genetic Code after Frameshift Mutations,” PeerJ 6 (2018): e4825, doi:10.7717/peerj.4825.
  2. David Haig and Laurence D. Hurst, “A Quantitative Measure of Error Minimization in the Genetic Code,” Journal of Molecular Evolution33 (1991): 412–17, doi:1007/BF02103132.
  3. Gretchen Vogel, “Tracking the History of the Genetic Code,” Science281 (1998): 329–31, doi:1126/science.281.5375.329; Stephen J. Freeland and Laurence D. Hurst, “The Genetic Code Is One in a Million,” Journal of Molecular Evolution 47 (1998): 238–48, doi:10.1007/PL00006381.; Stephen J. Freeland et al., “Early Fixation of an Optimal Genetic Code,” Molecular Biology and Evolution 17 (2000): 511–18, doi:10.1093/oxfordjournals.molbev.a026331.
  4. Shalev Itzkovitz and Uri Alon, “The Genetic Code Is Nearly Optimal for Allowing Additional Information within Protein-Coding Sequences,” Genome Research(2007): advanced online, doi:10.1101/gr.5987307.
  5. In The Cell’s Design, I explain why the genetic code cannot emerge through evolutionary processes, reinforcing the conclusion that the cell’s information systems—and hence, life—must stem from the handiwork of a Creator.