DNA Researchers Uncover Evidence that Overlapping Genes Are Widespread in Mammal Genomes
I have never cared much for puzzles, but I know some people can't get enough of them. (Although admittedly, I've gotten hooked on Sudoku.)
Whether you are a fan of the Sunday Puzzle or not, I have a word challenge for you. This game illustrates the concept of overlapping genes and hopefully will help you appreciate why I think biochemical systems represent the most profound evidence for intelligent design.
The rules for this game are straightforward. Come up with a sentence (or even a word) that will yield another meaningful sentence (or word) if the reading frame is shifted by either one or two letters to the right or left. For example:
The boy went to the store.
Reading frame shifted by one letter to the left:
T heb oyw entt ot hes tore.
Reading frame shifted by two letters:
Th ebo ywe ntto th est ore.
I have yet to come up with a sentence that works. Yet, solutions to this type of puzzle abound in the genomes of a wide range of organisms in the form of overlapping genes.
A Biochemistry Primer
Genes consist of a sequence of nucleotides (or genetic letters), abbreviated A, G, C, and T, linked together to form a molecular chain. The specific sequence of nucleotides dictates the amino acid sequence of the protein encoded by a particular gene.
The cell's machinery builds proteins using twenty different amino acids. The specific amino acid sequence determines the way the protein chain folds into a complex and precise three-dimensional structure. The overall shape of the protein dictates its function.
The set of rules the cell's machinery uses to relate the nucleotide sequence of a gene to the amino acid sequence of a protein is called the genetic code. The fundamental unit in the genetic code is a sequence of three nucleotides, referred to as a codon. There are sixty-four codons in the genetic code, since there are four different nucleotides found in DNA (4 3 =64). The coding assignments of the genetic code are redundant in some cases, since sixty-four codons specify twenty amino acids. Some amino acids are signified by a single codon only. Other amino acids are connoted by several different codons.
To illustrate how the genetic code translates information stored in DNA into information functionally expressed by proteins, consider the short messenger RNA nucleotide sequence: UCU CCU GCA AUU CGU AU. (To make proteins, the cell's machinery first copies the information housed in the gene by assembling a chain-like molecule called messenger RNA [mRNA]. Like DNA, mRNA consists of a sequence of nucleotides. The cell's machinery uses the same nucleotides to make mRNA as DNA with one exception: U is used in place of a T.)
UCU CCU GCA AUU CGU AU
If the cell's biochemical apparatus uses a reading frame that begins at the first position, the resulting protein will have the sequence: serine-proline-alanine-isoleucine-arginine, since UCU signifies serine, CCU signifies proline, etc. If the reading frame starts at the second position in the nucleotide sequence, an entirely different protein will be generated with the sequence: leucine-leucine-glutamine-phenylalanine-valine. Shifting the reading frame to the third nucleotide position yields a peptide with the sequence: serine-cysteine-asparagine-serine-tyrosine.
As evinced by this example, there are only three possible reading frames for a nucleotide sequence. Three very different proteins can be encoded by a single nucleotide sequence, simply by shifting the reading frame by either 1 or 2 nucleotides.
Biochemists believe that in most cases only one reading frame is used in living systems, and the nonoverlapping, “one gene, one protein” relationship holds. This expectation stems from repeated observations that when a gene's reading frame shifts as a result of a mutation, it almost always leads to catastrophic results. These so-called frameshift mutations result when nucleotides are accidentally inserted or deleted from a gene. And, as made evident in the above example with the model nucleotide sequence, a frameshift produces a protein with a radically different amino acid sequence. The mutant protein almost always is nonfunctional junk.
Frameshift mutations stand in contrast to substitution mutations, which involve the replacement of one nucleotide with another. This type of mutation merely replaces the one amino acid in the polypeptide chain with another. All other amino acids remain unchanged. Substitution mutations can be catastrophic, but more often than not these types of errors have limited, if any, effect on protein function because the gene's reading frame hasn't changed.
But in some cases two reading frames are used, and two genes overlap onto the same nucleotide sequence. In the late 1970s biochemists studying the bacteriophage fX174 (a virus that infects the bacterium, Escherichia coli) made a startling discovery: the genome of this bacteriophage directs the production of more proteins than it should, based on the size of its DNA. Researchers resolved this paradox when they demonstrated that some of the fX174 genes overlap (for example, see Nature 264 1976: 34-41).
This conclusion was quite unsettling at that time. Biochemists had considered the relationship “one gene, one protein” to be absolute and a cornerstone of molecular biology. Since the work on the bacteriophage fX174 genome biochemists have identified overlapping genes in other viruses, as well as in bacteria, insects, fish, and mammals. In each case, overlapping genes are read by the cellâ€™s machinery using a different reading frame.
Researchers noted that in most cases overlapping genes occur in some of the smallest, most compact genomes in nature (viruses and parasitic bacteria, like Mycoplasma genitalium.) The prevailing thinking was that the occurrence of overlapping genes in more-complex creatures was a rarity because they represent a costly arrangement for the organism. Mutations to one gene also mutate the overlapping partner.
A new study challenges this biochemical orthodoxy. A team of American and European scientists uncovered evidence that overlapping genes may well be widespread in mammal genomes. They point out that:
the skepticism surrounding eukaryotic dual coding is unwarranted: rather than being artifacts, overlapping reading frames are often hallmarks of fascinating biology.
This study provides motivation for molecular biologists to search for more examples of overlapping genes in mammals and other complex organisms. The new expectation is that more and more examples will be found.
The apparent widespread occurrence of overlapping genes doesn't make much sense from an evolutionary perspective, because of the cost they represent to the organism. This cost is only worth it if there is a rationale for overlapping genes. The research team notes that the overlapping genes they uncovered seem to be involved in biochemical multitasking.
Even though it is not a direct analogy to the overlapping genes found in the genome of organisms, the "overlapping sentence" word challenge highlights how difficult it is to come up with a sequence of letters (or in biochemical systems: nucleotides and amino acids) that house overlapping messages, even when an intelligent agent diligently seeks out a solution. Yet, solutions to this biochemical conundrum seem to abound throughout nature. For me, the only explanation for overlapping genes is the work of a Creator. It's hard to imagine how undirected evolutionary processes could produce overlapping genes.
For more reasons why biochemical information points to the work of a Creator, see Fazale Rana, "FYI: ID in DNA," Facts for Faith (issue 8, 2002).