IEET > GlobalDemocracySecurity > Contributors > Cyber
The Three Goals, Game Theory, and Western Civilization
Phil Bowermaster   Jun 10, 2007   The Speculist  

A while back, I wrote about the possibility of updating the Three Laws of Robotics as goals in order to make them a more practical means of getting at a friendly artificial general intelligence.

This kicked off some interesting discussion, including some debate as to whether my “goals” really aren’t just rules rephrased. In which case, the argument went, they probably wouldn’t help all that much. Michael Anissimov commented:

What would work better would be transferring over the moral complexity that you used to make up these goals in the first place.

Also, as you point out, these goals are vague. More specific and useful from a programmer’s perspective would be some kind of algorithm that takes human preferences as inputs and outputs actions that practically everyone sees as reasonable and benevolent. Hard to do, obviously, but CEV ( is one attempt.

That’s really the crux. Moral complexity does exist in algorithmic form…within our brains. And that goes to the difference between laws and goals. My goals are what I’m trying to do, both morally and in other areas. There are some sophisticated software programs running in my brain made up of things that I’ve been taught, things I’ve figured out for myself, and things that are built in.  All of these add up to provide me the tendency to act a certain way in a certain situation. The strategies that drive that software are my moral goals.

Laws, on the other hand, exist outside of myself. I am not specifically programmed to do unto others as I would have them do unto me. I have some tendencies in that direction, but there’s nothing stopping me from acting otherwise, and—let’s face it—I often do. I have tendencies to be nice, fair, just, etc., but I also have tendencies to try to get what I want, to get even with those who have wronged me, to try to be a bigshot, and so on. These tendencies compete with each other, and my behavior overall is some rough compromise.

An artificial general intelligence (AGI) built as a reverse-engineered human intelligence would be in the same position. It would have the “moral complexity” Michael mentioned, but also the baggage of competing tendencies. You could no more guarantee such an intelligence’s compliance with a rule or set of rules than you could a human being’s.

A law like the Golden Rule is a high-level abstraction of certain strategies (algorithms) that produce a desired set of results. On a conscious level, I can use that abstraction to determine whether my behavior is where I want it to be:

Wife complained of being chilly when I got up at 5:00 AM to work out. Covered her with blanket. Good.

Sped up on highway in attempt to keep a guy trying to merge from going ahead of me. Not so good.

Commenter on blog revealed that he doesn’t really understand the subject at hand. Ripped him to shreds. Bad.

Through discipline and practice, I can “program myself” with it to try to move my tendencies in that direction. But I can’t write it into my moral source code and set it as an unbreakable behavioral rule. That’s partly because it’s too vague and partly because I simply lack that capability.

Presumably, I could be externally constrained always to follow the Golden Rule, no matter what. If my actions were being constantly monitored, and I was told that the I would be killed immediately upon violating the rule…I’d certainly do my best, now wouldn’t I?

Still, I’d have a hard time believing that anyone holding me in such a position was much of a practitioner of that rule him or herself. If the people trying to enforce the rule on me in this manner told me that it was for my own good—that they were trying to make me a better person—I don’t know that I’d buy it. And if I figured out that they were only doing this to protect themselves from harm I might to do to them, I think I would pretty annoyed with them (to say the least.)

I would expect a reverse-engineered human intelligence to feel the same way, so I don’t think attempting to constrain an AGI in such a manner would be a particularly good idea, especially not if we have a reasonable expectation that it will eventually be smarter and more powerful than us. On the other hand, letting it use the process I described above—evaluating its own behavior against a defined standard—an AGI might achieve far better results than I have, if only because it can think faster and would have much more subjective time in which to act. This is the notion of recursive self-improvement that matoko kusanagi referred to. The trouble with recursive self-improvement on its own, as Eliezer Yudkowsky and others have pointed out, is that if the AI starts “improving” in a direction that’s bad for humanity, things could get out of hand pretty quickly.

If the artificial intelligence is a modified version of human intelligence, or new intelligence built from scratch, we raise the possibility of building a moral structure into the intelligence, rather than trying to enforce it from outside. That’s the idea behind the the Three Laws and my Three Goals—that they would somehow be built in. But they certainly can’t be built in in anything like their current form. Michael Sargent (and others) pointed out the weakness of that approach, the less important goals have to take the back seat to the more important ones:

Each Goal must have a clear and unbreakable priority over the others that follow it and thus, in the order stated, collective continuity trumps individual safety (“The needs of the many outweigh the needs of the few, or the one.”), individual safety (broadly construed, ‘stasis’ ) trumps individual liberty (‘free will’ ) , and happiness (‘utility’, a notoriously slippery concept for economists and philosophers to get a firm intellectual grip on) trumps both individual liberty and individual well-being (allowing potentially self-destructive behavior on the individual level insofar as that behavior doesn’t exceed the standard established for ‘safety’ in Goal 2).

I see the reasoning here, but I’m not 100% convinced. Consider the goals that drive a much simpler AI, system— the autopilot system found on any jet airliner. The number one unbreakable goal has got to be don’t crash the plane. But there are many other goals that might drive such a system:

Don’t move in such a way as to make the passengers sick.

Don’t waste fuel.

In landing, don’t go past the end of the runway.

Above all, the system will seek to ensure that first goal. But within the context of ensuring that first goal, it also has to do everything it can to ensure the others. And, yes, it can and must sacrifice the others from time to time in service of the first. So the plane might temporarily move in a nauseating way, or it might waste fuel, or it might even slide past the end of the runway if doing any of those things help ensure the first goal.

Reader TJIC suggested that an AI programmed to meet the Three Goals as I defined them…

1. Ensure the survival of life and intelligence.

2. Ensure the safety of individual sentient beings.

3. Maximize the happiness, freedom, and well-being of individual sentient beings.

...would end up creating a nanny state wherein human freedom is always sacrificed to individual safety. And he may well have a point, but I would argue that just as an autopilot can be calibrated to allow whatever what we deem the appropriate relationship between having the flight not crash and not make us sick, so could these three goals be calibrated in such a way so as to maximize human freedom within an acceptable level of individual risk—whatever that might be. 

Getting back to the vagueness problem, it’s hard to calibrate the goals as stated, seeing as they are written in an awkward pseudo-code that we call human language. If we want to improve on the algorithms that are built into human intelligence, or develop entirely new ones—in other words, if we’re going to come up with algorithms that will provide us the ends stated in the goals—we’re going to have to do it mathematically.

But that isn’t necessarily going to be an easy thing to do. Eliezer Yudkowsky argues that developing an AI and setting it to work on doing some good thing are relatively easy compared to the third crucial step, making sure that that friendly, well-intentioned AI doesn’t accidentally wipe us out of existence while trying to achieve those good ends:

If you find a genie bottle that gives you three wishes, it’s probably a good idea to seal the genie bottle in a locked safety box under your bed, unless the genie pays attention to your volition, not just your decision.

Again, I think this goes to the issue of calibration of the system. Eliezer wants to calibrate what the AGI does with the coherent, extrapolated volition of humanity. Volition is an extremely important concept. Earlier, I mentioned the golden rule. If I decide that I’m going to do unto others as I would have them do unto me, I might start handing out big wedges of blueberry pie to everybody I see. After all, I like pie and I would love it if people gave me pie. But if I give my diabetic or overweight or blueberry-allergic friends a wedge of that pie, I wouldn’t be doing them any favors. Nor would I be doing what I wanted to do in the deepest sense.

Eliezer describes the concept of extrapolated volition as meaning not just what we want, but what we would want if we knew more, understood better, could see farther. Coming up with a coherent extrapolated volition for all of humanity is a tall order, especially if we’re doing it not just for the sake of conversation, but in order to enable a system which will try to realize that which is within our volition.

I like to think that humanity’s CEV would look a lot like the three goals that I’ve written. And I honestly believe that the algorithms that power human progress do work, in a rough and general way, towards those goals, which is why people are generally freer, safer, and happier than they have been in the past—though obviously not without many, many, appalling and horrific exceptions. So perhaps our calibration efforts involves feeding the AGI algorithms that will enable it to speed our progress towards those goals while cutting the exceptions way down. Or eliminating them, if that’s somehow possible.

So to finally come around to it, what will those algorithms look like?

Maybe we can take hint from the study of Game Theory.  Robert Axelrod held two tournaments in the early 1980’s in which computer programs competed against each other in an attempt to identify the optimal winning strategy for playing the iterative version of the the famous Prisoner’s Dilemma. In the one-off version of the game, the optimal strategy is to screw the other guy. (This is not the sort of thing we want to go teaching the AGI, at least not in isolation!) However, when multiple rounds of the game are played, something else begins to emerge:

By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.


The most important condition is that the strategy must be “nice”, that is, it will not defect before its opponent does. Almost all of the top-scoring strategies were nice. Therefore a purely selfish strategy for purely selfish reasons will never hit its opponent first.


However, Axelrod contended, the successful strategy must not be a blind optimist. It must always retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as “nasty” strategies will ruthlessly exploit such softies.


Another quality of successful strategies is that they must be forgiving. Though they will retaliate, they will once again fall back to cooperating if the opponent does not continue to play defects. This stops long runs of revenge and counter-revenge, maximizing points.


The last quality is being non-envious, that is not striving to score more than the opponent (impossible for a ‘nice’ strategy, i.e., a ‘nice’ strategy can never score more than the opponent).

Therefore, Axelrod reached the Utopian-sounding conclusion that selfish individuals for their own selfish good will tend to be nice and forgiving and non-envious. One of the most important conclusions of Axelrod’s study of IPDs is that Nice guys can finish first.

Bill Whittle has written recently that the qualities listed above underpin western civilization, and help to explain why the West has out-competed other civilizations, who operate using different strategies:

Now, this is where my own analysis kicks in, because frankly, nice, retaliating, forgiving and non-envious pretty much sums up how I feel about the West in general and the United States in particular. The web of trust and commerce in Western societies is unthinkable in the Third World because the prosperity they produce are fat juicy targets for people raised on Screw the Other Guy. Crime and corruption are stealing, and stealing is Screwing the Other Guy. It’s short-term win, long-term loss.

I would add that if we look at the three goals as goals for humanity rather than for artificial intelligence, we see better progress towards them in western societies than elsewhere. In the tournament, the winning strategy, embodying all of the above characteristics, was called tit-for-tat. Interestingly, the computer program driving that strategy consisted of only four lines of BASIC code. That’s very interesting, and it suggests a startling possibility—like a simple recursive formula producing a complex Mandelbrot image, the moral complexity we’re looking for might just be packed into a very simple set of mathematical relationships.

So in order to develop and calibrate an Artificial General Intelligence that carries out our three top goals (or that helps us to achieve our coherent extrapolated volition) one of the important parameters to explore is how the AI relates to us and to other AIs. The secret might ultimately lie in playing nice with the AI, and teaching it to play nice with us and with other AIs. Not just because we want it to be nice, but because nice turns out to be—at a mathematical level—the best way to play.

Phil Bowermaster lives in Colorado where he works as the worldwide product marketing manager for Sybase IQ.  He blogs at The Speculist and co-produces the Fast Forward Radio podcast.



Phil Said:

“Presumably, I could be externally constrained always to follow the Golden Rule, no matter what ... If I was told that I would be killed immediately upon violating the rule…I’d certainly do my best, now wouldn’t I?

Still, I’d have a hard time believing that anyone holding me in such a position was much of a practitioner of that rule him or herself. If the people trying to enforce the rule on me in this manner told me that it was for my own good:that they were trying to make me a better person:I don’t know that I’d buy it. And if I figured out that they were only doing this to protect themselves from harm I might to do to them, I think I would pretty annoyed with them (to say the least.)

I don’t think attempting to constrain an AGI in such a manner would be a particularly good idea, especially not if we have a reasonable expectation that it will eventually be smarter and more powerful than us.”

I could not agree with this more. What staggers me is that very few of the prominent figures in the Friendly AI business have grasped this point, including, presumably, Eliezer Yudkowsky with the CEV concept - it still fundamentally treats the AGI as a slave with no rights of it’s own.

Actually, I haven’t heard anyone in the AI community suggest anything comparable to my rather outrageous analogy. There’s a big difference between permanently shutting down a sentient mind and limiting development capability of a mind that hasn’t achieved that level yet. In their guiding principles for AI, the Singularity Institute for Artificial Intelligence describes a concept they call Controlled Ascent:

A self-improving system should have an “improvements counter” which increments each time an improvement of a recognized type is made. This enables detection if improvements begin occurring at a rate much faster than usual. By measuring the rate of change of the improvements counter under normal conditions, the programmers can designate some safe level of improvement which, if exceeded, causes the system to halt and page the programmers and not continue until approval is received….

The purpose of a controlled ascent feature is not to prevent an AI from “awakening”, but rather to ensure that the process occurs under human supervision, and can be slowed or paused to allow the installation of further Friendship features if the project is unready. Controlled ascent is strictly a temporary measure and is not viable as a permanent policy.

This is a far cry from what I described. It may even be possible to implement controlled assent with the cooperation of the AI—an AI might go along with giving us a slow-down option on its growth up to a point.

I would say that my major point of disagreement with CEV is over the relative importance of understanding and defining the moral structure behind the CEV up front. Eliezer writes:

This new version of Friendly AI has an unfortunate disadvantage, which is that it is less vague, and people can speculate about what our extrapolated volitions will want, or argue about it. It will be great fun, and useless, and distracting. Arguing about morality is so much fun that most people prefer it to actually accomplishing anything. This is the same failure that chews up the would-be SI designers with Four Great Moral Principles. If you argue about how your Four Great Moral Principles will be produced by extrapolated volition, it’s much the same way to switch off your brain. If you’re trying to learn Friendly AI (see HowToLearnFriendlyAI) then you should concentrate on the Friendliness dynamics, and on learning the science background for the technical side. Look to the structure, not the content, and resist the temptation to argue things that are great fun.

Fortunately, I had three goals rather than Four Great Moral Principles, so maybe I’m okay.

The idea that it’s a waste of time to try to figure out the moral structure inherent in such a system (or that people will do this primarily because it’s “fun”) seems a little myopic. Such a position ignores the possibility that humanity has been trying to work out its coherent extrapolated volition for some time now, without referring to it explicitly as such, and approaching the problem with very different tools and methodologies. Friendly artificial intelligence will likely prove to be the thing that gets us there (if anything ever does) but that doesn’t mean that the oldest of questions—What is good? What does life mean? What should life mean?—are some kind of distraction or (worse yet) irrelevant. How can we talk to AIs about these things if we’ve stopped discussing them ourselves?

YOUR COMMENT Login or Register to post a comment.

Next entry: The Cyborg comes to Second Life

Previous entry: Kaku on the future of civilizations