Human Extinction Risks due to Artificial Intelligence Development

AI also can halt in late stages of its development because of either technical problems or "philosophical" ones.

I am sure that a map of AI failure levels is needed for the creation of Friendly AI theory as we need to be aware of various risks. Most of the ideas in my map come from "Artificial Intelligence as a Positive and Negative Factor in Global Risk" by Eliezar Yudkowsky, from chapter 8 of "Superintelligence" by Bostrom, from Ben Goertzel blog, from hitthelimit blog, and some are mine.

I will now elaborate on three ideas from the map which may need additional clarification.

The problem of the chicken or the egg

The question is what will happen first: AI begins to self-improve, or the AI has a malicious goal system. It is logical to assume that the goal system change will occur first, and this gives us a chance to protect ourselves from the risks of AI, because there will be a short period of time when AI already has bad goals, but has not developed enough to be able to hide them from us effectively. This line of reasoning comes from Ben Goertzel.

Unfortunately many goals are benign on a small scale, but become dangerous as the scale grows. 1000 paperclips are good, one trillion are useless, and 10 to the power of 30 paperclips are an existential risk.

AI halting problem

Another interesting part of the map are the philosophical problems that must face any AI. Here I was inspired after reading the Russian-language blog hitthelimit.

One of the ideas is that the Fermi paradox may be explained by the fact that any sufficiently complex AI halts. (I do not agree that this completely explains the Great Silence.)

After some simplification, with which “hitthelimit” is unlikely to agree, the idea is that as AI self-improves its ability to optimize grows rapidly, and as a result, it can solve problems of any complexity in a finite time. In particular, it will execute any goal system in a finite time. Once it has completed its tasks, it will stop.

The obvious objection to this theory is the fact that many of the goals (explicitly or implicitly) imply infinite time for their realization. But this does not remove the problem at its root, as this AI can find ways to ensure the feasibility of such purposes in the future after it stops. (But in this case it is not an existential risk if their goals are formulated correctly.)

For example, if we start from timeless physics, everything that is possible already exists and the number of paperclips in the universe is a) infinite b) unchangable. When the paperclip maximizer has understood this fact, it may halt. (Yes, this is a simplistic argument, it can be disproved, but it is presented solely to illustrate the approximate reasoning, that can lead to AI halting.) I think the AI halting problem is as complex as the halting problem for Turing Machine.

Vernor Vinge in his book Fire Upon the Deep described unfriendly AIs which halt any externally visible activity about 10 years after their inception, and I think that this intuition about the time of halting from the point of external observer is justified: this can happen very quickly. (Yes, I do not have a fear of fictional examples, as I think that they can be useful for explanation purposes.)

In the course of my arguments with “hitthelimit” a few other ideas were born, specifically about other philosophical problems that may result in AI halting.

One of my favorites is associated with modal logic. The bottom line is that from observing the facts, it is impossible to come to any conclusions about what to do, simply because oughtnesses are in a different modality. When I was 16 years old this thought nearly killed me, because I realized that it is mathematically impossible to come to any conclusions about what to do. (Do not think about it too long, it is a dangerous idea.) This is like awareness of the meaninglessness of everything, but worse.

Fortunately, the human brain was created through the evolutionary process and has bridges from the facts to oughtness, namely pain, instincts and emotions, which are out of the reach of logic.

But for the AI with access to its own source code these processes do not apply. For this AI, awareness of the arbitrariness of any set of goals may simply mean the end of its activities: the best optimization of a meaningless task is to stop its implementation. And if AI has access to the source code of its objectives, it can optimize it to maximum simplicity, namely to zero.

Lobstakle by Yudkowsky is also one of the problems of high level AI, and it's probably just the beginning of the list of such issues.

Existence uncertainty

If AI uses the same logic as usually used to disprove existence of philosophical zombies, it may be uncertain if it really exists or it is only a possibility. (Again, then I was sixteen I spent unpleasant evenings thinking about this possibility for myself.) In both cases the result of any calculations is the same. It is especially true in case AI is a philozombie itself, that is if it does not have qualia. Such doubts may result in its halting or in conversion of humans to philozombies. I think that AI that do not have qualia or do not believe in them can't be friendly. This topic is covered in the map in the bloc "Actuality".

The status of this map is a draft that I believe can be greatly improved. The experience of publishing other maps has resulted in almost a doubling of the amount of information. A companion to this map is a map of AI Safety Solutions, which I will publish later.

The map was first presented to the public at a LessWrong meetup in Moscow in June 2015 (in Russian)

Pdf of the roadmap is HERE: