Hamlet, Act III, scene ii
Today when someone "protests too much," it can create a suspicion that the protested viewpoint or opinion may actually be true, or at least have legitimate merit. In Shakespeare’s day, however, the word “protest” was used in the same way that we use the word “promise.” So when Queen Gertrude observed that, “the Lady doth protest too much, methinks,” she was really saying that “the Lady” is promising more than she could deliver.
A number of scientists are arguing that the ENCODE Project—much lauded by creationists and the Intelligent Design Movement for its September 2012 announcement that, at minimum, 80 percent of the human genome consists of functional elements1—is promising way too much hope for creationists and intelligent design proponents, who pointed to this result as a challenge to one of the best arguments for an evolutionary origin of humanity.2 However, a careful examination of the critiques of ENCODE suggests that maybe these skeptics are so passionate in their criticism that it seems the opposite may be true: namely, the ENCODE Project’s conclusion is actually justified.
In part 1 of this series, I listed what I think to be the most significant criticisms published by ENCODE skeptics. Here in part 2, I will describe and respond to two of the most serious objections: (1) The ENCODE Project used a faulty definition of function; and (2) the project results are absurd in light of the evolutionary paradigm.
How Do Biologists Define Function?
Defining function in biological systems is far from straight forward. In many respects, the definition depends on philosophical considerations as much as anything else. In a published critique of ENCODE, University of Houston biology professor Dan Graur and his coauthors suggest defining functional elements in genomes as either selected effect or causal role. In their critique, researchers Deng-Ke Niu and Li Jiang highlight three ways to define biological function: (1) selected effect, (2) sequences correlated with disease, and (3) essential sequences determined by knockout experiments. As part of his assessment of ENCODE, biochemist W. Ford Doolittle identified selected effect, regions deemed essential as determined by ablation (knockout experiments), and mere existence of sequences in the genome as three distinct definitions for function.
In all three cases, the research teams argue that the ENCODE Project employed a faulty definition for biological function. These skeptics maintain that if the ENCODE scientists had used an appropriate definition, then they would have discovered that only 5 to 10 percent of the human genome is functional, not 80 percent.
Yet, the ENCODE skeptics can’t agree on the best way to define biochemical utility or to determine which genetic sequences are functional. Moreover, the three research teams disagree on which definition the ENCODE team employed when assigning function to the human genome’s sequences.
Definitions of Function
For the purposes of this article, it is worth briefly examining the different definitions for biochemical function that are in play.
- Selected effect: According to Doolittle, “the functions of a trait or feature are all and only those effects of its presence for which it was under positive natural selection in the (recent) past for which it is under (at least) purifying selection now. They are why the trait or feature is there today and possibly why it was originally formed.”3 In other words, sequences in genomes can be deemed functional only if they evolved under evolutionary processes to perform a particular function. Once evolved, these sequences, if they are functional, will resist evolutionary change (due to the effects of natural selection) because any alteration would compromise the function of the sequence and endanger the organism. If already deleterious, the sequence variations would be eliminated from the population due to the reduced survivability and reproductive success of organisms possessing those variants. Hence, functional sequences are those under the effects of selection.
- Sequences associated with diseases: Niu and Jiang point out that one way to determine function is if variations in the sequence are associated with a disease. The idea is that if a sequence alteration results in a genetic disorder, then the sequence must have some utility.
- Essential sequences determined by ablation: Genetic ablation, or knockout experiments, could be useful for identifying functional sequences in genomes. According to this thought, if an organism can tolerate the disabling or removal of a particular DNA sequence within its genome, then this sequence must not be functional. Conversely, if deactivation or elimination of a specific DNA sequence leads to the organism’s death, then that sequence must be essential and, therefore, functional.
- Existence of sequences in genomes: Doolittle points out that the mere presence of a sequence or process associated with genomes could be taken as evidence for their function. In other words, if the DNA sequence is found in the genome, it must be there for some reason. Doolittle explains, “Because a region is transcribed, its transcript must have some fitness benefit, however remote.”4
- Causal definition: According to Graur’s team, “for a trait, Q, to have a ‘causal role’ function, G, it is necessary and sufficient that Q performs G.”5 In other words, the causal definition ascribes function to sequences that play some observationally or experimentally determined role in genome structure and/or function.
Graur and his team prefer the selected effect definition. They write that “only sequences that can be shown to be under selection can be claimed with any degrees of confidence to be functional.”6 Doolittle also prefers this definition. These ENCODE skeptics argue that the selected effect definition is the only one that fits naturally into the context of the evolutionary paradigm. Graur’s group believes “most biologists use the selected effect concept of function, following the Dobzhanskyan dictum according to which biological sense can only be derived from evolutionary context.”7
But Graur and his team and Doolittle readily acknowledge that it can be difficult to determine which sequences in a genome are under selection. Niu and Jiang appear sympathetic to the selected effect definition, too, but they point out that some functional regions of genomes aren’t under selection and yet remain critical. That is, Niu and Jiang believe the selected effect definition underdetermines functional regions of the genome. They prefer to define functional regions through ablation. This definition is not on the radar screen for Graur’s team; Doolittle dismisses it outright because he sees it as being equivalent, in essence, to a casual definition. Based on how I understand their arguments, Niu and Jiang would disagree with Doolittle. They would argue that ablation serves as a proxy for natural selection.
How Did the ENCODE Project Define Function?
All of these critics reacted strongly to the way the ENCODE Project assigned function to sequences in the human genome. Graur and his team accuse the ENCODE researchers of adopting a “strong version” of causal function. Doolittle, however, argues that ENCODE used the “mere existence” definition of function. Niu and Jiang don’t specify how they believe the ENCODE Project defined biological function, but from reading their paper, I get the impression they would most likely agree with Graur’s group.
How, then, did the ENCODE Project define function? I don’t think Doolittle is correct in his assessment. It seems to me that the ENCODE Project did more than assign function to sequences based on its mere existence in the human genome. I would maintain that the ENCODE Project employed a causal definition of function. The ENCODE Project focused on experimentally determining which sequences in the human genome displayed biochemical activity using assays that measured:
- binding of transcription factors to DNA,
- histone binding to DNA,
- DNA binding by modified histones,
- DNA methylation, and
- three-dimensional interactions between enhancer sequences and genes.
The implied assumption is that if a sequence is involved any of these processes—all of which play well-established roles in gene regulation—then the sequences must have functional utility. To use Graur’s lingo: sequence Q performs function G, therefore, sequence Q is functional.
Is There Anything Wrong with the Way the ENCODE Project Defined Function?
So what’s wrong with the causal definition of function? From my vantage point: nothing. Biochemists typically determine function using this definition. Even Doolittle acknowledges this point. He states,
The approach embodies what philosophers would call a causal role definition of function and supposedly eschews evolutionary or historical justifications. Much biological research into function is done this way.8
In fact, this approach is consistent with how scientists operate routinely. They perform experiments to determine cause-and-effect relationships.
But Graur’s group, Niu and Jiang, and Doolittle reject the causal definition. Why? For no other reason than that a causal definition ignores the evolutionary framework when determining function. For many biologists this practice is unthinkable. They insist that function be defined exclusively within the context of the evolutionary paradigm. In other words, their preference for defining function has more to do with philosophical concerns than scientific ones—and with a deep-seated commitment to the evolutionary paradigm.
As a biochemist, I am troubled by the selected effect definition of function because it is theory-dependent. In science, cause-and-effect relationships (which include biological and biochemical function) need to be established experimentally and observationally independent of any particular theory. Once these relationships are determined, they are then used to evaluate the theories at hand. Do the theories predict (or at least accommodate) the established cause-and-effect relationships, or not?
Using a theory-dependent approach poses the very real danger that experimentally determined cause-and-effect relationships (or, in this case, biological functions) will be discarded if they don’t fit the theory. And, again, it should be the other way around. A theory should be discarded, or at least reevaluated, if its predictions don’t match these relationships.
In fact, this is exactly why Graur’s group feels motivated to protest the conclusion of the ENCODE Project. According to the ENCODE results, 80 percent of the human genome contains functional DNA sequences. Yet, other measurements say only 10 percent of the human genome is under selection. This means that 70 percent of the genome must be functional, without being under the influence of selection. This brings us to another critique on the list I complied in part 1: specifically, that the results of the ENCODE Project are absurd in light of the evolutionary paradigm.
In the Light of Evolution
Graur and his coauthors argue that the discrepancy between the ENCODE results and the 10 percent is absurd and, therefore, ENCODE must be wrong. But the only reason to cite this discrepancy as a motivation for rejecting the ENCODE conclusions is if one embraces the selected effect definition of function—which is inextricably intertwined with the evolutionary paradigm. If, however, one uses the theory-independent casual definition employed by ENCODE, then 80 percent of the human genome is functional because it has been experimentally determined to be so. This result stands independent of the theory of evolution. In fact, one could argue that the mismatch between the theory’s prediction and the experimental data are a basis to be skeptical about biological evolution. As the old canard goes, “Theories guide, experiments decide.”
So what about Niu and Jiang’s proposal that function should be determined from knockout experiments? It is true that life scientists often employ this strategy to determine biological function. Depending on how one regards this approach, it could be seen as compatible with either the selected effect or the causal definitions. But, in my opinion, the ablation approach will underdetermine the functional sequences in an organism’s genome.
Researchers have determined the essential, minimal gene set for a number of microbes using knockout experiments. They consistently find that the essential gene set is but a fraction of the entire genome. Yet, this essential set is context dependent. In other words, it depends on the environment in which the microbe finds itself. If the microbe is growing in a nutrient-rich laboratory medium, the required gene set looks very different than the ensemble of genes required if the microbe is growing outside the lab with only limited access to foodstuff.
Up to this point, it appears as if the criticism of the ENCODE results are more philosophical than scientific. The ENCODE skeptics’ insistence on defining biological function in a theory-dependent manner seems motivated more by their own philosophical concerns than genuinely scientific ones. A strong pre-commitment to the evolutionary paradigm also factors into their objections—because the ENCODE results do not fit comfortably within that paradigm. Yet, there is no reason to believe that the ENCODE Project’s method for assigning function to sequences in the human genome is flawed fundamentally. In fact, it is consistent with how science operates in general.
In the third and last installment of this series, I will discuss the four remaining objections to the ENCODE Project results.