Problems with Defining an Existential Risk

In his 2002 paper, Bostrom defines the term a couple of ways. The most prominent definition specifies an existential risk as “one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential” (my italics). Bostrom provides a matrix to graphically represent this definition, Figure A, which comes from his 2008 co-edited book Global Catastrophic Risks. As you can see, an existential risk is defined as being transgenerational in scope and terminal in intensity.

By “transgenerational,” Bostrom means that it affects “not only the current world population but all generations that could come to exist in the future.” We’ll examine some problems with this term below. By “terminal,” Bostrom doesn’t mean “an end to our lineage,” as one might assume given the standard dictionary definition of the word. If this were the case, the conception of existential risks as transgenerational-terminal events would directly contradict the second part of the definition above, which counts certain survivable events as existential risks as long as they involve the permanent and drastic compromising of our potential. The idiosyncratic definition adopted by Bostrom characterizes “terminal” in terms of either “causing death or permanently and drastically reducing quality of life,” as he puts it. Essentially, “terminal” can also mean “endurable,” with some extra conditions. So the prima facie contradiction between these two definitions disappears by stipulation: a “terminal” event is one resulting in either death or a lasting reduction of life-quality. Let’s call this Definition #1.

In my reading of the literature, the term “existential risks” is more often used in a different way, as referring to scenarios in which our species kicks the bucket. For example, the Centre for the Study of Existential Risks states on its website that “an existential risk is one that threatens the existence of our entire species.” Bostrom himself gestures at this definition when he writes, in his 2002 paper, that “an existential risk is one where humankind as a whole is imperiled” (my italics).ⁱ Thus, this definition focuses on Homo sapiens in particular, rather than “Earth-originating intelligent life” more generally, which could include a multiplicity of beings in addition to humanity. Let’s call this Definition #2.

Notice that Definition #2 has objective criteria, whereas Definition #1 is thoroughly normative. This isn’t necessarily a bad thing, but it does introduce some complications. For example, what exactly does “intelligent life” mean? If humanityⁱⁱ dies out in a decades-long volcanic winter but chimpanzees survive, would this count as an existential risk? Chimps don’t perform as well as humans on IQ tests, but surely it would be wrong to claim they aren’t “intelligent.”

One might respond that the potential of earth-originating life would be severely compromised if chimps were to take over the planet. But as before, what exactly does “potential” mean? One answer comes from transhumanism: reaching our potential has something to do with becoming posthuman, with continuing a trend of ever-increasing productivity, efficiency, information processing, economic and population growth, and so on.ⁱⁱⁱ But not everyone agrees with these measures of “progress.” The anarcho-primitivist, for instance, might assert that life will reach its full potential once humanity returns to nature, adopting the old hunter-gatherer mode of subsistence we abandoned 12,000 years ago. Others might argue, as Naomi Oreskes and Erik Conway (implicitly) do in their book The Collapse of Western Civilization, that the relevant measure of progress shouldn’t be gross domestic product, but “the Bhutanian concept of gross domestic happiness.”

While one can put forth more or less cogent arguments for these positions, the point is that there exist no “facts of the matter” at which one can point to settle disputes about what “potential” means. The same goes for “intelligent life”: where do we draw the line? Is counting human beings but not chimps as intelligent life too anthropocentric? These are non-trivial issues, as they call into question the core meaning of Definition #1.

(One might also object that the word “permanent” is too strong. Why? Because it disqualifies scenarios in which human civilization is severely crippled for millions, even billions of years, but then recovers, from counting as existential risks. As long as the damage isn’t forever, the risk ain’t existential, even if it’s nearly forever. The further out one goes in time, the less plausible the word “permanent” appears.)

***

There are more pressing issues, though: both Definition #1 and #2 are susceptible to Gettier-like counterexamples, or outcomes that quack like a duck but aren’t, and ones that don’t quack like a duck but are. Beginning with Definition #2, consider the fact that in evolutionary biology every state is an in-between state. There are no final forms, only transitional ones. This means that (even bracketing the phenomenon of cyborgization) Homo sapiens is not a permanent fixture of the biosphere: we are a mere link between the not-yet-extant and the already extinct.

Imagine for a moment that millions of years pass and natural selection drastically modifies our phenotypes. We become a new species, call it Gedanken experimentus. Imagine further that this species is more intelligent, wise, physically fit, and altruistic than we have ever been. For example, it figures out how to create dual use technologies that could destroy the universe, while also figuring out how to prevent them from ever being used this way. Gedanken experimentus is, in a word, better than us, relative to some non-controversial criteria of evaluation.

The problem is that if one equates an existential risk with human extinction, as Definition #2 does, the above situation counts as an existential risk, since the appearance of Gedanken experimentus millions of years from now through anagenesis (let’s say) would mean the extinction of Homo sapiens. Yet this scenario looks positively desirable. In fact, transhumanism explicitly promotes the extinction of our species; this is what the creation of posthumanity is all about. We “transcend” our current nature, which is “a work-in-progress, a half-baked beginning that we can learn to remold in desirable ways” (from Bostrom’s “Transhumanist Values”). Let’s call this the Problem of Good Extinction.^iv

An exactly opposite issue arises for Definition #1. Consider this possible future: let’s say we successfully upload a conscious mind to a supercomputer. Because this mind supervenes upon a silicon (or carbon nanotube) substrate, it’s able to think orders of magnitude faster than we can. In addition, given the multiple realizability of cognitive systems, let’s say this mind clones itself to produce a large population of artificial intelligences (identical pasts, but unique futures). This new society of simulated brains, operating on a faster timescale than humanity, quickly takes over the task of creating a superintelligence, and through the positive feedback process of recursive self-improvement they rapidly become far more clever than us.

Suddenly, then, humanity is confronted by superbeings that tower over us to the extent that we tower over the guinea pig, cognitively speaking. Here’s the catch, though: the first emulated brain – the one from which all the others were spawned – came from a religious extremist. Consequently, the entire population of superbeings inherited its radical beliefs about the way the world is and ought to be. This may seem implausible, but it’s perfectly consistent with Bostrom’s “orthogonality thesis,” according to which end goals and intelligence are not tightly coupled. In other words, a superintelligence could be combined with just about any set of aims and motivations, however strange, irrational, and dangerous they may appear to us.

Because of their religious extremism, then, these superbeings – which qualify as posthumans on Bostrom’s definition^v – quickly destroy humanity. Furthermore, because of their cleverness, they go on to realize all the marvels of technology, foment unprecedented economic growth, colonize the cosmos, and generally live happily ever after. (This gestures at some scenarios that I discuss in a recent IEET article, in which human annihilation by future AIs may actually be morally recommendable.)

The point is that we have a situation here in which Earth-originating intelligent life is neither annihilated, nor is its potential curtailed in a permanent and drastic way.^vi It follows that no existential risk has occurred, even though our species – indeed, our entire lineage – has been violently destroyed. Bostrom himself counts this as an existential risk, and goes into great detail in his excellent book Superintelligence about how we can prevent our cognitive children from engaging in an act of parricide. Let’s call this the Problem of Bad Extinction.^vii

It’s worth noting here that the appearance of “human extinction” in the top right box of Figure A is a bit misleading. While human extinction could constitute an existential catastrophe, this will only be the case if we are the only instance of “Earth-originating intelligent life.” This term is broader than us, though: it could include both Gedanken experimentus, as well as the technoprogressive (but malicious) posthumans mentioned above. Thus, subsequent depictions of this typology should make clear that the top right box includes but is not limited to human extinction scenarios.

***

Finally, there is a nontrivial problem with Bostrom’s typology of risks itself. Take a close look at the y-axis of Figure A. Bostrom divides the scope of an event’s consequences into personal, local, global, and transgenerational categories. These are presented as alternative possibilities, but not all of them are: the first three concern geography, so to speak, whereas the fourth concerns history. And different geographies can be mixed with different kinds of histories.

We can put it like this: just as a risk is defined as the probability of an event times its consequences, and the consequences of an event can be analyzed into scope and intensity, so too can the scope of an event’s consequences be analyzed into spatial and temporal components. Thus, we can ask: “For a given time slice, how many people has this event affected?” and “Over a given time segment, how many people has this event affected?” These are completely distinct questions, since the first concerns the spatial scope of a consequence, whereas the second concerns its temporal scope.

For example, a germline mutation could, if survivable, be classified as a personal-endurable risk. But because it affects the germ cells, it is transgenerational in (temporal) scope: in a given time slice it might affect only one person, but peering across time it could affect many people (namely that individual’s descendants). By contrast, a somatic mutation that results in cancer, if survivable, would be a personal-endurable risk that’s generational in scope, since it wouldn’t affect “all generations that could come to exist in the future” (within the relevant spatial scope). Moving up a level, the bombing of Hiroshima could be classified as a local-endurable event, since it caused significant harm but didn’t destroy the quality of life completely (to borrow Bostrom’s definition of “endurable”^viii). And its effects have clearly been transgenerational – something dealt with by multiple generations. In contrast, Superstorm Sandy was a local-endurable event in New York City that was merely generational in (temporal) scope.

Once we get to the spatial category of “global,” the difference between generational and transgenerational is the difference between individuals in the population and the population itself. (Where a species is the largest population possible of a biological kind.) This makes sense of the phenomenon of aging, which prompted Bostrom to expand his typology between his 2002 and 2008 publications. On the present account, aging is unproblematic: it constitutes a global-terminal risk whose temporal scope is generational. That is to say, every individual is affected by death, but the population itself isn’t. A runaway greenhouse effect or grey-goo disaster, in contrast, would count as a global-terminal risk whose temporal scope is transgenerational: it affects everyone everywhere forever.

An existential risk can thus be redefined more precisely as any catastrophe that’s spatially global, temporally transgenerational, and terminal in intensity. This fixes the typology, but how does it deal with the two counterexamples above? To address these problems, we need to be specific about the particular population of beings affected by the existential event, since both problems derive from how “everyone everywhere” is construed. (In Definition #1 it’s too broad; in Definition #2 it’s too narrow.)

Perhaps a solution is to specify the relevant group as either our current population or any future population that we care about.^ix By defining the group affected in terms of what we value, the problems of good and bad extinction can be effectively solved: insofar as Gedanken experimentus is a desirable species to become, this scenario doesn’t count as an existential risk; and insofar as we don’t want our lineage to be destroyed by a race of superintelligent posthumans (however superior their moral status or aligned with transhumanist values of infinite progress), this scenario does constitute such a risk. Let’s call this Definition #3.

This offers a modified account both of what an existential risk’s consequences are, and of what the relevant group affected by these consequences is. I don’t put it forward only as a tentative idea – a point of departure for further discussion. I have argued that there are a number of problems with the two primary definitions currently being used in the literature – problems that, to my knowledge, no one hast yet pointed out. Perhaps Definition #3 is a small step in the right direction.

(For more on this and a number of related issues, see my forthcoming book The End: What Religion and Science Tell Us About the Apocalypse.)

i One finds this in other papers of his as well, as in the abstract of Bostrom’s “Existential Risk Prevention as Global Priority.

ii Here I’ll use the non-anthropology sense of Homo sapiens specifically, rather than our entire genus.

iii See Bostrom’s “Transhumanist Values.” Search for “technological progress.”

iv Notice that Definition #1 handles this scenario perfectly well. It doesn’t count it as an existential risk, which seems right.

v See “Why I Want to be a Posthuman When I Grow Up.”

vi Accepting, say, transhumanism’s definition of “potential.”

vii Notice that Definition #2 handles this scenario perfectly well. It counts it as an existential risk, which seems right. Also notice that if bad extinction weren’t a problem, the project of Friendly AI would be pointless.

viii We could, alternatively, simply say that the risk was endured by the locality, which is, after all, the relevant (spatial) scope.

ix Or would care about if we knew about them.