Moore’s law and the origin of life: a study in demarcation

Here I want to move from the broad overview I gave last time to a specific recent case at the borderlands between science and pseudoscience: Alexei A. Sharov and Richard Gordon’s paper “Life Before Earth,” currently unpublished but in preview at Cornell’s arXiv. PZ Myers has already harshly commented on the paper, clearly relegating it to the dustbin of pseudoscience. While I don’t disagree with PZ’s overall assessment, I think it is instructive to get a bit deeper into it and unpack the reasons for such judgment. After all, Sharov and Gordon are actual scientists (the first at the National Institute on Aging, the second at the Gulf Specimen Marine Laboratory), and the paper has all the appearance of legitimate, if boldly out of the mainstream, science. (Of course the fact that it hasn’t been published, yet, in a peer reviewed journal is an issue; but it’s not like there is no garbage being published by legitimate scientific journals anyway.)

To begin with, it is not entirely clear what the Sharov and Gordon paper is actually about. Yes, their main claim appears to be that one can use Moore’s so-called law (actually a straightforward empirical generalization based on a very specific data set), a linear relationship between CPU transistor counts and their dates of introduction, and apply it to the problem of when life might have originated in the universe. Here is the stunning claim made by the authors as a result of their exercise: “Linear regression of genetic complexity (on a log scale) extrapolated back to just one base pair suggests the time of the origin of life = 9.7 ± 2.5 billion years ago.” This would be stunning, if true, for the simple reason that earth itself is only 4.54 billion years old, which would therefore squarely place the origin of life in some other region of the cosmos.

Before we get to the meat of Sharov and Gordon’s claim, however, it is worth noting what is so confusing about the paper in the first place. Their main result is presented in Section 1 of the 26-pp. long work, with much of the rest devoted to tangentially, or not at all, related points about which the authors provide an odd mix of a meager overview of the literature and their (largely unsubstantiated) personal opinions. For instance, Section 2 deals with the variability of the rate of evolution as well as Gould and Eldredge’s famous theory of punctuated equilibria. While Sharov and Gordon do need to reassure their readers that their proposed trend of increase in genomic complexity isn’t undermined by too wild fluctuations of said rate over geological time, this has actually little if anything to do with punctuated equilibria, which is a theory about morphological evolution, which has been applied so far only to a relatively small subset of biological taxa.

Section 3, addressing the question of why genomic complexity (allegedly) increased exponentially during the history of life on earth is highly speculative to say the least (invoking concepts such as “evolution as cascading emergencies”), not to mention extremely brief.

Section 4 addresses the almost comical question of whether life could have originated from a single nucleotide (a bit of genetic information). Most practicing biologists would answer “hell no” to that question, but Sharov and Gordon treat us to a highly idiosyncratic (and even more debatable) tour of origin-of-life theories, including the RNA world, the idea of Graded Autocatalysis Replication Domains, the theory of autocatalytic reactions and so forth. Interesting, highly speculative, marginally relevant.

Section 5 is yet another, more in-depth, detour on hypotheses that tackle the problem of going from (very speculative) early surface metabolism (i.e., heritable metabolic systems that allegedly evolved on mineral surfaces) to the RNA world, and finally to the evolution of the first cells. Add some additional speculation on LUCA (the Last Universal Common Ancestor of all life forms on earth), and you can move to Section 6, addressing the question of how life can possibly survive in the interstellar void.

This is of course necessary because if life originated before earth did, then it follows that some sort of panspermia-type hypothesis must hold: life got started somewhere else, and then somehow made it to the third planet in our solar system. Needless to say, this bit of the paper is also highly speculative, having to do with the possibility that the solar system got its building materials from the explosion of a nearby star, for instance. Moreover, the authors bring up the discovery of bacteria that survived in ice for 750,000 years as if that were a reasonable approximation to millions of years of existence in the interstellar void, and so forth.

We then move to Section 7 of the Sharov and Gordon, where they explore the implications of a cosmic origin of life, speculating (wildly) that eventually we may be able to reconstruct a single evolutionary tree elucidating the phylogenetic relationships between terrestrial and extraterrestrial bacteria (never mind, of course, that no sample of the latter is in sight, at the moment...). Interestingly, the authors come down against the idea of “intelligent panspermia,” i.e., the possibility that life on earth (and other planets) was seeded on purpose by extraterrestrial intelligent beings. How do they know that? They confidently state that the evolution of intelligence requires 10 billion years, though we are not told how they arrived at such a bold and confident conclusion. In the same section the authors also manage to trash the famous Drake equation (one of the few theoretical foundations of the SETI program), and to answer the infamous Fermi paradox: the reason we haven’t heard from other intelligences in the universe is because we are likely the first one to appear on the block.

[Incidentally, and disturbingly, a good number of the references given in the paper are to Wikipedia entries! Not exactly the highest standard of scholarship I can think of.]

Section 8 deals with the lagging of genetic complexity when compared to the complexity of the mind. What does that have to do with the only original point of the paper (remember? The one about Moore’s law and the timing of the origin of life)? Not much, but what the hell. Here we find claims such as that humans are “superior” (not an evolutionary term) to mice (I guess these guys never read The Hitchhiker’s Guide to the Galaxy), and that languages too evolve following Moore’s law: did you know, for instance, that Chinese went from 2,500 characters 3,200 years ago to 47,000 characters today, which yields a “rate of language doubling time” (huh?) of 825 years, which in turn “exceeds the rate of brain increase in evolution by a factor of >3000”? Wow. Meanwhile, of course, romance languages haven’t evolved much, despite their ability to produce Shakespeare, or the Sharov and Gordon paper...

But wait! We ain’t done yet! Section 9 goes on to extrapolate the growth of complexity into the future, because extrapolating it back to the past isn’t a hazardous enough practice. To their credit, Sharov and Gordon dismiss Ray Kurzweil’s ideas about a forthcoming “technological singularity,” though they do so not based on the incoherence and lack of empirical support of the concept, but rather on the rather simplistic assumption that, you know, humans will always be in control of the power grid, so all we need to do to stem the onslaught of the Cylons is to pull the plug...

The last, tenth, section of the Sharov and Gordon paper has to do with a “biosemiotic perspective” on things, that is with the standpoint that considers living organisms qua agents. From there the authors immediately slide into semi-incoherent talk of reintroducing goals and meanings in the natural sciences, give a completely irrelevant nod to my own field of phenotypic plasticity studies and promptly quit.

All of the above doesn’t quite cross into pseudoscience, though it skirts perilously and repeatedly near that fuzzy borderline. A charitable reading of it is that the bulk of Sharov and Gordon’s paper is a somewhat disjointed, highly speculative tour de force of the field of origin-of-life and (somewhat) related studies. But what about the core of their manuscript, Section 1?

Well, the first highly questionable statement there is that “the core of the macroevolutionary process ... is the increase of functional complexity of organisms.” No, it isn’t. Stephen Gould long ago persuasively argued that there is no necessary direction of increased complexity throughout evolution. The only reason why complexity historically follows simplicity is because life had to start simple, so it only had “more complex” as a direction of (stochastic, not directed) movement. It’s a so-called “left wall” effect: if you start walking (randomly, even) from near a wall, the place you end up is away from the wall. And of course, as Gould again pointed out, life on earth was (relatively) simple and bacterial for a long, long time — and none the worse for it either. Moreover, the most complex organism on earth — us — though very successful in certain respects, is actually a member of a very small and often struggling group of large brained social animals. Measured by criteria such as biomass, bacteria still beat the crap out of us “superior” beings.

But the real problems begin for the Sharov and Gordon paper when they finally get to the business at hand: correlating genomic complexity with time of origin of the respective organisms, and then extrapolating back in time. [As a commenter on my Twitter stream pointed out, they could just as “reasonably” have extrapolated into the far future, arriving at the conclusion that the entire universe will eventually be made of DNA...]

The authors realize that simple genome length won’t cut it, because what matters is functional complexity, and there are some portions of the genomes of various organisms that are redundant and possibly without function. Nonetheless, they end up plotting the log-10 of genome size against time, which is how they arrive at the figure of 9.7 billion years ago for the origin of life. As PZ Myers quickly pointed out, however, even if we accept the procedure at face value, they simply cherry picked the data: plenty of organisms that don’t show up on the graph (plants and fungi, for instance) would completely scramble the results. Make no mistake about it: this is a fatal blow to the entire enterprise, and one that the authors ought to have thought about well before posting the paper.

The second fundamental problem, of course, is with Moore’s law itself: as I mentioned at the beginning, it was derived empirically from a very specific data set having to do with a particular type of human technology. There is no reason on earth (or beyond it!) to assume that the “law” (actually, a limited empirical generalization) should hold for measures such as genomic complexity, or for natural phenomena that are not of human origin.

Lastly (third fatal blow), the problem is with the procedure of statistical extrapolation itself. It’s very useful, of course, but it needs to be deployed with much caution. As anyone taking a Stat 101 course soon learns, data interpolation (i.e., curve fitting within the available range of data points) is a very effective and reliable technique to predict “missing” data; but extrapolation (i.e., extension of the curve fitting beyond the available data range) is a tricky business. Unless one is very confident that whatever mechanism underlies the relationship among the data actually holds regardless of range, one is on very shaky ground. Take, for instance, the most common curve fitting exercise in biology: the one relating the rate of growth of a population to time. If we start, say, a culture of bacteria with fresh growth medium, the colony will initially follow an exponential curve; if we extrapolate this curve a bit into the future, though, we arrive at the nonsensical prediction that the colony will soon take over the entire planet. It doesn’t. Why not? Because resources are limited and because there is going to be competition to acquire them from other species. Which is why many biological populations actually follow a logistic growth curve: they start out exponentially, then begin to slow down, and eventually reach an equilibrium determined by environmental constraints (the so-called carrying capacity in ecology). More complex dynamics (some actually leading to extinction) are possible too, but the point is that an ecologist who took seriously the initial exponential growth and used it for predictions in a scientific paper would be a fool to be laughed out of court immediately. Sharov and Gordon give us no reason to take their extrapolation backwards in time any more seriously.