Archive for the ‘Biology’ Category

Little Motors, Big Designer

Friday, January 30th, 2009

Previously Posted on Jan 18th, 2008 by David H. Rogstad, Ph.D.

Photo of Dave RogstadAs a student I came across the humorous definition of a nuclear physicist as one who was “learning more and more about less and less, until finally he knew everything about nothing.” In today’s world of research, this reference to the wonders of nature at its tiniest levels could also be said about the biologist. Every day we are treated to new discoveries revealing the amazing intricacies of the biological cell and the molecular machines that govern its functionality, all at a size that requires an electron microscope to even begin to see.

In the middle of last year, Science Magazine published a fascinating review article on the subject of molecular motors and their use in nanotechnology. In the first part of the article, the authors point out how the cell is best described as a miniature factory where literally thousands of machines perform various specialized tasks. These functions include: allowing the cell to replicate itself in under an hour (what factory do you know of that can perform this feat?), proofreading and repairing errors in its own manufacturing instructions (DNA), sensing its environment and responding to it, changing its shape and morphology, and obtaining energy from photosynthesis or metabolism.

To accomplish all of these tasks, the cell has a wide variety of specialized molecular motors that are direct analogs of the kind of devices that engineers design and build for man-sized factories. These include: “electric” motors having stators, rotors, shafts, bearings and universal joints; transport “trucks” that provide stepwise motion along “highways” called microtubules or filaments; and pumps made from tubes and cams that force fluids along the tubes. The major differences between these molecular motors and those made by humans are their size (a billion times smaller) and their efficiency (near 100 percent vs. 65 percent, at best).

If biomolecules can be successfully integrated into nanotechnology devices, there are several advantages, including the self-assembly characteristics of protein-based machines, the possibility of using other biological components from nature, and the fact that the processes for manufacture are environmentally benign and occur under mild conditions.

Research efforts in nanotechnology over the past several decades have produced various components of the machinery, like cogwheels or pumps, but have not yet been able to produce the motors needed to make the machinery go. The article asks whether the nano-machines found in nature can be used directly or serve as templates. So far, results indicate protein motors can be interfaced and made to drive the man-made nanoscale components but have limited lifetimes of only a few days. To date, no usable devices have been made. However, in the near future it is likely that progress will be made using the parts from cells, eventually allowing researchers to build tailor-made devices for the sorting of materials, assembly of different materials, concentration of materials for enhanced detection, along with many of the functions performed within cells.

One thing is clear: the machines found in cells are absolutely remarkable in their characteristics, challenging the minds and creativity of the most advanced researchers in nanotechnology. Yet, they are almost identical in form (but superior in efficiency and size) to the mechanical devices that the best engineers design for everyday life. Surely the biomachines found in cells require a level of intelligent design far greater than what man has accomplished!

Error Control Coding in Biology Implies Design, Part 4 (of 5)

Friday, December 12th, 2008

Keith McPherson

Photo of KeithMcPhersonKeith McPherson received his Master of Science in Electrical Engineering from Georgia Institute of Technology in 1993, and currently works as an electrical engineer in Melbourne, FL, in the fields of communications and signal processing.

 --------------------------------------

Anyone up for tic-tac-toe and genetics? It’s not exactly a game, but grab a cup of coffee and let’s explore the intriguing design in genetic systems.

In parts 1, 2, and 3 of this series, we observed three features of genetic systems that suggest design. In this article we look at isomorphic systems to find yet another analogy in the genetic code.

Analogy: Genetic Code-Like (GCL) Binary Representation

Recall the genetic code mapping table discussed in part 1 and part 2 of this series. This table describes the mapping between the 64 codons and the 20 amino acids.

Researchers in Venice, Italy, have identified a specific and unique number system and have used it to mathematically model the genetic code. In this work, the 64 codons and the 20 amino acids are assigned to numerical elements within the system, referred to as the genetic code-like (GCL) binary representation. The GCL binary representation is a unique model that holds true to the specific redundancy features in the natural genetic code mapping.1 It is a mathematical model of the underlining physical/chemical processes related to genetic information processing—a so-called structural isomorphism.

An isomorphism is a one-to-one correspondence between the elements of two sets such that the result of an operation on elements of one set corresponds to the result of the analogous operation on their images in the other set. If two sets are isomorphic with respect to certain properties, then those properties that are true of one of the sets must also be true of the other.

For example, a six-sided die and a bag from which a number 1 through 6 is chosen are isomorphic. As another example, tic-tac-toe and the “game of 15” are isomorphic. In the game of 15, players take turns saying a number between 1 and 9. Numbers may not be repeated. Both players aim to say three numbers that add up to 15. Although perhaps not obvious, the defining characteristics of this number game are identical to those of tic-tac-toe. It turns out that both games are based on the well-known (to mathematicians) magic square.

The GCL binary representation and the genetic code are also isomorphic systems (sets). So, characteristics that are true of the GCL binary representation must also be true of the genetic code. (See here and here for more details of isomorphic systems.)

What are the characteristics of the GCL binary representation? The European researchers noted that this mathematical model exhibits:2

  • Palindromic symmetry
  • Parity symmetry
  • Organized redundancy
  • A rich mathematical structure

Such elegant symmetry, organization, and structure speak of a code that has been designed for a purpose—no mere afterthought of evolutionary chance events.

Also, the GCL binary representation makes possible the existence of error detection/correction codes that operate along the strands of DNA. A parity code (as discussed in part 3 of this series) is one example of such a technique.

If a parity code or similar technique functions along strands of DNA using the GCL binary representation, then dependence must exist in the genetic data along the strand. In other words, the data must be correlated to some degree.

Assuming the GCL binary representation and using two different robust statistical analysis methods, the research team discovered significant short- and long-range correlation peaks in actual DNA sequences. These results confirm that actual DNA sequences using this specific model satisfy a basic prerequisite for such error-minimizing techniques.

The team noted that “an error-control mechanism implies the organization of the redundancy in a mathematically structured way,” and that “[t]he genetic code exhibits a strong mathematical structure that is difficult to put in relation with biological advantages other than error correction.”

Thus, scientific advance has uncovered a peculiar and unique mathematical model that accounts for the key properties of the genetic code. This model exhibits symmetry, organized redundancy, and a mathematical structure that would be vital for the existence of error-coding techniques operating along the DNA strands. Actual DNA data tested using this model gives a strong hint that such further error-coding techniques may very well exist and provides impetus for future study in this area. Purposeful creation seems a reasonable conclusion.

Part 5 (the last entry) of this series will discuss the impact of these analogies on William Paley’s watchmaker argument.

Notes/References:

  1. It describes exactly the 1st level of degeneracy (i.e., redundancy) in the natural genetic code (i.e., the number of codons which map to specific amino acids), and gives deep insight into the 2nd level of degeneracy (i.e., the association between specific codons and specific amino acids).

  2. A full treatment of how the GCL binary representation displays these features is very technical and difficult to communicate without active use of visual aids. Nevertheless, this note is a brief attempt. See here for the necessary visual aid. Table 3 shows the actual GCL binary representation of the natural genetic code. The numerical elements in the model are associated with biochemical elements in the genetic code. The whole numbers 0–23 along the table edges map to amino acids. The 6-bit binary strings map to codons. The degeneracy number is shown in the center of the table, along with the amino acids coded for. The degeneracy number indicates how many different codons code for each particular amino acid. A careful inspection will show that all 64 codons are present, along with all 20 amino acids. Palindromic symmetry is seen as a reflection through the middle of the table in a left to right fashion. Palindromic amino acids are always associated in pairs. For example, tryptophan (Trp) and methionine (Met) are a pair of palindromic amino acids. Note that if the 6-bit codewords are folded on top of each other through the middle of the table, they form bitwise negated pairs, reflecting a very peculiar mathematical structure for palindromic symmetry. Also note the symmetry evident in the parity, again through the middle of the table. The light entries are odd parity (odd number of 1’s) and the dark entries are even parity (even number of 1’s). The interested reader is referred to the journal paper for further details.

Error Control Coding in Biology Implies Design, Part 3 (of 5)

Friday, December 5th, 2008

Keith McPherson

Photo of KeithMcPhersonKeith McPherson received his Master of Science in Electrical Engineering from Georgia Institute of Technology in 1993, and currently works as an electrical engineer in Melbourne, FL, in the fields of communications and signal processing.


   ------------------------------------

Parts 1 and 2 of this series observed that biological genetic systems function as information-processing systems, and a case was made for coding techniques that protect the genetic data. As a specific example, the genetic code appears designed to minimize the effects of errors in a way that is directly analogous to Gray codes. Gray codes are commonly used by engineers to protect data processed by many modern digital communications systems.

We now turn our attention to another analogy.

Analogy: Complementary Base Pairing Parity Code

A useful concept to have in mind to better appreciate this analogy is Hamming distance.1 Good codes reduce the probability of error by increasing the minimum Hamming distance between codewords relative to the distance that would have been obtained if no code (or a less powerful code) was used. This minimum distance of a code is like the weakest link in a chain and, therefore, characterizes the code’s strength.

As the code’s minimum distance is increased, it is easier for a recipient to detect a message with no errors. Consider a scavenger hunt game where you “hide” something for a toddler to find, for example, a large blue ball among a pile of smaller white balls. The intent is to make the target blue ball stand out among the others. This exercise is roughly analogous to high Hamming distance. The large blue ball (the intended message) among a number of smaller white balls (errors) exhibits a relatively large dissimilarity, and so the blue ball is easily visible and detectable.

On the other hand, as the code’s minimum distance is decreased the recipient is more likely to make errors in message detection. Now consider an adult scavenger hunt where objects are “hidden” in plain sight because they blend in so nicely in their surroundings. It is difficult to “see” an object you are looking for, even if you are staring straight at it. The intent here is to make the object blend in with the objects around it. This is roughly analogous to low Hamming distance. The target object (the intended message) is hidden among very similar objects (errors) and exhibits a relatively large similarity. Thus, the desired object is not easily visible and detectable.

In digital communications perhaps one of the simplest examples of an error-detecting code (see here and here) is an even parity code. This code is used on a binary message frame (i.e., a sequence of binary digits, 1’s and 0’s). In this code, one parity bit is added to a message frame and its value is chosen to “round” the frame “value” out, to make the message stand out more among the possibilities by increasing the code’s minimum distance (like the example with the blue ball). This allows the recipient to more easily detect that an error has occurred if it detects a “non-round value” (i.e., a white ball). Values that are “round” or “non-round” have precise mathematical definitions in coding theory. The main point is that all parity codes, and the even parity code in particular, impart a precise mathematical structure to the protected (coded) data. This mathematical structure increases the minimum distance between valid codewords and allows for more robust error detection. (See here for more information on parity codes used in engineering.)

Recall that the DNA is a double-strand structure, specifically a double helix. And the four nucleotide bases in the DNA chemical alphabet are A, C, G, and T. Nucleotides A and T are complementary, as are G and C, and these pairings are the basis for the double-stranded structure, where each strand carries the same information as the other strand. Research into the chemical bonds at work between these complementary base pairs reveals that the natural nucleotide alphabet has been chosen to minimize the probability that a given nucleotide on one strand will be incorrectly paired with a partner on the opposite strand. More specifically, a researcher found that the nucleotides used for the DNA chemical alphabet actually form an even parity code. (See research work here and here.)

A convention was used to consistently assign binary values (i.e., 1 or 0) to certain features associated with the four nucleotides that comprise the chemical alphabet in DNA. The relevant features are the relative size of the nucleotide, and its donor-acceptor pattern. The donor-acceptor pattern is relevant for hydrogen bonding. Hydrogen bonds are formed between a nucleotide and its partner on the opposite strand. Careful observation of the resulting binary values reveals that they form an even bit parity code. To be more precise, the relative size of a nucleotide is related to its hydrogen donor-acceptor (D/A) pattern as a parity bit.2

In nature, there are actually 16 nucleotides. Why did nature settle on these specific four, and why only four? At first glance, one may impugn a designer’s inefficiency because there are more nucleotides that could have been used to increase the size of the alphabet, leading to a more efficient genetic code and protein synthesis mechanism. In fact, there has been speculation along these lines. The researcher used the same binary convention to determine the binary representation for the other 12 nucleotides. Upon close inspection, he found that the 16 total nucleotides can be arranged using this framework as eight belonging to the even parity set, and eight belonging to the odd parity set, where the natural alphabet uses a subset of the even parity nucleotides. “Nucleotide Hamming distance” is maximized as a result, as is typical for parity codes, leading to a robust mechanism for error minimization.

The resulting specific four nucleotides emerge as optimal. From this perspective, the genetic machinery is directly analogous to a 1-bit, even-parity code decoder as used routinely in engineering applications. Such a decoder is the optimal way to recover the intended message when a parity code has been used.

The parity code model is an interpretation that readily flows from the relevant chemical bonds that bind the complementary nucleotide pairs. It is a way to mathematically represent or express what is happening at a chemical level. The researcher comments that:

The purine-pyrimidine and hydrogen donor-acceptor patterns governing nucleotide recognition are shown to correspond formally to a digital error-detecting (parity) code, suggesting that factors other than physiochemical issues alone shaped the natural nucleotide alphabet…When this error-coding approach is coupled with chemical constraints, the natural alphabet of A, C, G, and T emerges as the optimal solution for nucleotides.3

In summary, we have seen that an error-detecting code (a parity code) is at work to minimize incorrect bonding between nucleotide pairs on the complementary strands of DNA. For DNA replication to be accurate it is critical that the strands be the true complement of each other. We furthermore note that the specific code used by DNA, an even parity code, is a mainstay in modern communications systems.

The next article in this series will examine another coding analogy between modern digital communications systems and the genetic information-processing system.

Notes/References:

  1. See here and here for more information on Hamming distance.

  2. Refer to Figure 1 here. A and G are larger nucleotides called purines, C and T are smaller nucleotides called pyrimidines. Lone pairs are rich in electrons and participate in weak bonding with hydrogen atoms to form hydrogen bonds between complementary pairs. Hydrogen atoms are also referred to as hydrogen donors, and lone pairs as hydrogen acceptors. A binary “1” was assigned for hydrogen (i.e., hydrogen donors, D). A binary “0” was assigned to “lone pairs” (i.e., hydrogen acceptors, A). A binary “1” was assigned to the smaller nucleotides (i.e., pyrimidines). A binary “0” was assigned to the larger nucleotides (i.e., purines).

  3. Dónall A. Mac Dónaill, “A Parity Code interpretation of Nucleotide Alphabet Composition,” ChemComm 18 (2002): 2062-63.