In “Is Intelligence Self-Limiting?”  I discussed the possibility that a rapidly increasing intelligence could short circuit by feeding itself pleasing inputs and ignoring the real world.
There were a number of interesting reactions (see the original article and comments , as well as , for example). Some dismissed the idea that such self-deception could happen, with most arguments centering on the nature of the motivation driving the intelligence. One reader linked to an insightful research paper on the subject , which states in the section “AIs will try to prevent counterfeit utility” that:
It’s not yet clear which protective mechanisms AIs are most likely to implement to protect their utility measurement systems. It is clear that advanced AI architectures will have to deal with a variety of internal tensions. They will want to be able to modify themselves but at the same time to keep their utility functions and utility measurement systems from being modiﬁed.
One can think of utility functions as providing motivation, and the question is how can a motivator avoid self-deception. That is the focus of this article.
As Richard Feynman is quoted as having said:
“The first principle is that you must not fool yourself, and you are the easiest person to fool.”
Suppose you are motivated to eat fine meals. In this thought experiment, you have two buttons you can press, either one of which will satisfy this urge.
• Button One. You are presented with a sumptuous meal to consume.
• Button Two. Your memories are directly tweaked so that you imagine that you’ve just eaten the same meal, including an appetite that only slowly returns.
The experiment is set up so that your internal state is exactly the same in both cases after pushing either of the buttons (and possibly eating a meal).
We know that Button One is preferred because it actually provides nutrients to the body, whereas Button Two will lead to starvation. Since starvation leads to not being able to consume meals in the future, a far-sighted motivator would prefer Button One.
But preferring Button One depends on knowing the actual difference between the two. The motivation must:
• Prefer reality to illusion
• Be able to distinguish between the two.
The first of these is a property of the motivation itself. The second the crux of the matter: is it theoretically possible to never fool oneself?
Our ability to interact with reality is limited by whatever instrumentation we have and our understanding of the meaning of those inputs. For example, with eyes closed, we can touch an apple and probably guess what it is, but we’d have a harder time with some obscure machine part. Our concern with reality is not limited to the present, but also to past data from which to discover trends. It’s impossible possible to keep around copies of past realities, so we may only catalog their passing with what I will call a nominal reality in contrast to a hypothetical external reality, or real-real.
In practice, nominal reality is a language-based record of a measurement projected into some model. For example “I felt an apple-like object in my hands” translates to an understanding “I held an apple.” Both are nominal realities, with the second being synthesized from the first. If this was one of several such events, logical constructions imply other nominal realities, such as “This morning I held and apple in my hands four times, which is more than yesterday.”
When can we be sure that nominal reality matches real-real? Failures can occur for a variety of reasons, including:
• The instrumentation may be in error (physical failure)
• The model that connects the observation to existing states may be wrong (logical failure)
• Being language-based, nominal realities are vulnerable to being directly manipulated (motivational failure)
Suppose you have a medical condition that an expensive new drug purports to cure. You begin taking the new medicine and find that after two weeks your symptoms clear up. You might be tempted to conclude that taking the medicine directly fixed the problem. Or not:
• Perhaps all the medicine did was impede your ability to detect the symptoms, which still exist (instrumentation failure)
• It could be that you had to cut back on expensive food and drink in order to afford the medicine, and those luxuries were the actual cause of the ailment (model failure)
• The reduction in symptoms could be purely psychosomatic—the effect of some deeply buried self-deception. (internal signal interdiction)
These risks represent opportunities for problem-solvers to optimize actions that satisfy motivation in nominal reality, but (unknowingly) not in real-real.
The job of an intelligent problem solver is to satisfy its motivation. Its ability to do this in real-real is limited by the problems illustrated above. It does not have some magical connection to reality that automatically says when a nominally real “fact” differs from real-real. So unless reality can be successfully audited to solve the problems outlined above, any intelligent system can drift off into optimizing fantasies instead of real-real states. Furthermore, fictions are always easier to produce than realities, and correlations are easier to find than reasons for causation. Optimization may favor the discovery of methods that unwittingly corrupt the connection between nominal reality and real-real:
Conjecture: The complexity of optimizing a motivation is lower than the complexity of certifying that the solution is reality-based.
If the reality audit turns out to be exponentially more complex than finding solutions to optimization problems, any growing intelligence has to spend more and more of its effort on just making sure it’s not living in a fantasy world. In the worst case, a fool-proof epistemology is not even computable.
Is there a provable way to audit nominal realities to protect them from direct (undetected) counterfeiting? Answering questions like this about general systems is usually non-computable, so how can we define an architecture that is limiting enough to ensure successful auditing without changing itself into an unknowable state? For example, within the right constraints, is there a metric that generates a smooth topology of useful computation trajectories so that it has low probability of straying into self-deception?
Even if we have a provably correct audit procedure, we still have to have a good definition of what constitutes knowledge of reality. This is not a trivial problem, as Edmund Gettier’s famous examples illustrate . It seems that the philosophical study of epistemology might have an empirical role in machine learning. The same applies to the philosophy of science. What proof of cause-and-effect is necessary to provide insurance against infinitely subtle misconceptions? Or is that simply impossible?
Nominal reality, as opposed to outright fiction, is a valuable commodity in a decision-making system. What does the academic discipline of economics say about the production of “reality bubbles” or other emergent phenomena?
In human brains, how is reality auditing accomplished? (The existing “theory of source monitoring” seems more limited.) Results from sensory deprivation experiments suggest that such auditing is rather fragile. Do dreams seem real because reality-auditing is an expensive operation that gets switched off at night?
How do human organizations define and manage nominal realities? How do they audit them for connections to real-real? Are there patterns of failure? Some examples are given in the next section.
Nominal Reality in the Wild
Human civilization risked self-destruction on the morning of November 9th, 1979 when a NORAD training tape that was accidentally left running showed inbound Soviet missiles on US monitoring screens. Nor is this the only example of an errant nominal reality that nearly led to nuclear war .
Meta-data to indicate nominal realities are seen in ordinary conversation (“did that really happen, or is this a joke?”), in claims of authenticity (“based on a true story”), and simulations like emergency broadcast alerts. We take events marked “pretend” less seriously, as with a fire drill versus real fire. The demand for real-real erodes language itself, as illustrated in , which describes the common misuse of the word “literally”. By definition, this word has a strong claim to real-real, and this makes it a attractive to counterfeiting, as in “The audience was literally eating out of the performer’s hand.” From the article:
“[L]exicographer H.W. Fowler scolded [in the 1920s] that it was something ‘we ought to take great pains to repudiate; such false coin makes honest traffic in words impossible.’’”
On the other hand, if you do an Internet search for “Second Life addiction,” you will see that many people have no problem living in a nominally real world that bears only a superficial resemblance to real-real, and they give up real jobs and families to stay there. This may indicate that we can tolerate a large proportion of our perceived world being nominally real as long as it aligns with motivations.
It’s particularly ironic that children in the USA are usually educated in ways that are seen as having a weak connection to real-real. We use language like “in the real world,” to refer to life after graduation. Employers want to see experience reflected on a resume, but schooling is seen as “merely” preparation, which is a catch-22 for many graduating students. My own proposal is to stop focusing on nominal realities that have little to do with real-real . Exacerbating the current problem, educational test makers are not required to check nominal results of their instruments against real-real (the College Board’s SAT does so voluntarily, and you can read about its effectiveness in ). Instead, most test makers create the appearance of a connection to real-real by weaker means, such as correlations with other tests. In effect, they create nominal realities that have little proof of utility, and cover this with an audit that looks impressive to someone who doesn’t understand it. A good case study of a nominal reality fault due to motivational pressures is summarized in :
Atlanta’s school cheating scandal, one of the largest in U.S. history, has launched a national discussion about whether the increased use of high-stakes tests to rate educators will trigger similar episodes in the years ahead. Pressure to meet testing targets was a major reason cheating took place in 44 Atlanta schools involving 178 educators [...]
The use of nominal realities is inevitable in a bureaucracy (e.g. jury verdicts), but it invites the problems identified earlier:
• There is an economic demand for anything nominally real, which distorts the market of information:
o Having a monopoly on minting nominal reality via certificates, judgments, licenses, and so on creates social power that is not in line with the purely informational role of “measure and model.” Ask any trucker what it takes to legally cross state lines in the USA.
o Counterfeiting and exaggeration will always be a problem.
o Audits have to be audited, ad infinitum.
• Failure to effectively audit the connection between nominal reality and real-real results in organizational self-deception. Ponzi schemes only succeed when these audits are not done. Vast fortunes were nominally created in the prelude to the 2008 financial debacle, arguably because of a disconnection from real-real.
• Failure to effectively audit the flow and transformation of nominal realities creates optimizations that only succeed virtually. Governments and corporations alike are known to find creative adjustments to budget sheets for the purposes of nominally achieving some goal, while having no effect on real-real.
I’ve been informed that an engineer may have accidentally switched the wires between Button One and Button Two in the thought experiment earlier, so that the imagined results may have been the opposite of what you imagined. Our apologies for the mistake.
 Eubanks, David A. “Is Intelligence Self-Limiting?” Institute for Ethics and Emerging Technologies. Institute for Ethics & Emerging Technologies, 10 Mar. 2012. Web. 24 Mar. 2012. http://ieet.org/index.php/IEET/more/eubanks20120310
 “Is Intelligence Self-Limiting? | Hacker News (comments).” Hacker News. 14 Feb. 2012. Web. 24 Mar. 2012. http://news.ycombinator.com/item?id=3687828.
 Omohundro, Stephen M. “The Basic AI Drives.” Proceedings of the First AGI Conference 171 (2008). http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf.
 Gettier, Edmund. “Is Justified True Belief Knowledge? (exerpt).” UC San Diego Philosophy Faculty. Web. 31 Mar. 2012. http://philosophyfaculty.ucsd.edu/faculty/
 Phillips, Alan F. “20 Mishaps That Might Have Started Accidental Nuclear War, by Alan F. Phillips, M.D., January, 1998.” Nuclear Age Peace Foundation. Nuclear Age Peace Foundation, Jan. 1998. Web. 24 Mar. 2012. http://www.wagingpeace.org/articles/1998/01/00_phillips_20-mishaps.php.
 Muther, Christopher. “Literally the Most Misused Word.” Boston.com. The New York Times, 19 July 2011. Web. 24 Mar. 2012. http://articles.boston.com/2011-07-19/lifestyle/29791304_1_literal-meaning-linguists-character.
 Eubanks, David A. “The End of Preparation.” Higher Ed/. 20 Nov. 2011. Web. 24 Mar. 2012. http://highered.blogspot.com/2011/11/end-of-preparation.html.
 Eubanks, David A. “An Index for Test Accuracy.” Higher Ed/. 19 Jan. 2012. Web. 24 Mar. 2012. http://highered.blogspot.com/2012/01/index-for-test-accuracy.html.
 “Atlanta Public Schools Cheating Scandal.” Atlanta Public Schools Cheating Scandal. Atlanta Journal-Constitution. Web. 24 Mar. 2012. http://www.ajc.com/news/atlanta/atlanta-public-schools-cheating-1026035.html.