Mapping Approaches to AI Safety

In their article there are three main categories of safety checks on AI:

Social constraints

External constraints

Internal constraints

I have added three more categories:

AI is used to create a safe AI

Multi-level solutions

Meta-level

which describes the general requirements for any AI safety theory.

In addition, I have divided the solutions into simple and complex. Simple solutions are the ones whose recipe we know today. For example: “do not create any AI”. Most of these solutions are weak, but they are easy to implement.

Complex solutions require extensive research and the creation of complex mathematical models for their implementation, and could potentially be much stronger. But the odds are less that there will be time to realize them and implement successfully.

Additional ideas for AI safety been including in the map from the work of Ben Goertzel, Stuart Armstrong and Christiano.

My novel contributions include:

1. Restriction of the self-improvement of the AI. Just as a nuclear reactor is controlled by regulating the intensity of the chain reaction, one may try to control AI by limiting its ability to self-improve in various ways.

2. Capture the beginning of dangerous self-improvement. At the start of potentially dangerous AI it has a moment of critical vulnerability, just as a ballistic missile is most vulnerable at the start. Imagine that AI gained an unauthorized malignant goal system and started to strengthen itself. At the beginning of this process, it is still weak, and if it is below the level of human intelligence at this point, it may be still more stupid than the average human even after several cycles of self-empowerment. Let's say it has an IQ of 50 and after self-improvement it rises to 90. At this level it is already committing violations that can be observed from the outside (especially unauthorized self-improving), but does not yet have the ability to hide them. At this point in time, you can turn it off. Alas, this idea would not work in all cases, as some of the objectives may become hazardous gradually as the scale grows (1000 paperclips are safe, one billion are dangerous, 10 to power 20 are an x-risk). This idea was put forward by Ben Goertzel.

3. AI constitution. First, in order to describe the Friendly AI and human values we can use the existing body of laws. (It would be a crime to create an AI that would not comply with the law.) Second, to describe the rules governing the conduct of AI, we can create a complex set of rules (laws that are much more complex than Asimov’s three laws), which will include everything we want from AI. This set of rules can be checked in advance by specialized AI, which calculates only the way in which the application of these rules can go wrong (something like mathematical proofs based on these rules).

4. "Philosophical landmines." In the map of AI failure levels I have listed a number of ways in which high-level AI may halt when faced with intractable mathematical tasks or complex philosophical problems. One may try to fight high-level AI using "landmines", that is, putting it in a situation where it will have to solve some problem, but within this problem is encoded more complex problems, the solving of which will cause it to halt or crash. These problems may include Godelian mathematical problems, nihilistic rejection of any goal system, or the inability of AI to prove that it actually exists.

5. Multi-layer protection. The idea here is not that if we apply several methods at the same time, the likelihood of their success will add up. Simply adding methods will not work if all the methods are weak. Rather, the idea is that the methods of protection can work together to protect the object from all sides. In a sense, human society works the same way: a child is educated by an example as well as by rules of conduct, then he begins to understand the importance of compliance with these rules, but also at the same time the law, police and neighbors are watching him, so he knows that criminal acts will put him in jail. As a result, lawful behaviour is his goal which he finds rational to obey.

This idea can be reflected in the specific architecture of AI, which will have at its core a set of immutable rules, on which the human emulation will be built to make high-level decisions. Complex tasks will be delegated to a narrow Tool AIs. In addition, an independent emulation (conscience) will check the ethics of its decisions. Decisions will first be tested in a multi-level virtual reality, and the ability of self-improvement of the whole system will be significantly limited. That is, it will have an IQ of 300, but not a million. This will make it effective in solving aging and global risks, but it will also be predictable and understandable to us. The scope of its jurisdiction should be limited to a few important factors: prevention of global risks, death prevention, and the prevention of war and violence. But we should not trust it in such an ethically delicate topic as the prevention of suffering, which should be addressed with the help of conventional methods.

FULL SIZE VERSION

This map could be useful for the following applications:

1. As illustrative material in the discussions. Often people find solutions in an ad hoc way, once they learn about the problem of friendly AI or are focused on one of their favorite solutions.

2. As a quick way to check whether a new solution really has been found.

3. As a tool to discover new solutions. Any systematization creates "free cells" to fill for which one can come up with new solutions. One can also combine existing solutions or be inspired by them.

4. There are several new ideas in the map.

A companion to this map is the map of AI failure levels. In addition, this map is subordinated to the map of global risk prevention methods and corresponds to the block "Creating Friendly AI" Plan A2 within it.