Getting Human-Like Values into Advanced OpenCog AGIs

Currently OpenCog comprises a comprehensive design plus a partial implementation, and it cannot be known with certainty how functional a fully implemented version of the system will be. The OpenCog project is ongoing and the system becomes more functional each year. Independent of this, however, the design may be taken as representative of a certain class of AGI systems, and its conceptual properties explored.

An OpenCog system has a certain set of top-level goals, which initially are supplied by the human system programmers. Much of its cognitive processing is centered on finding actions which, if executed, appear to have a high probability of achieving system goals. The system carries out probabilistic reasoning aimed at estimating these probabilities. Though from this view the goal of its reasoning is to infer propositions of the form “Context & Procedure ==> Goal”, in order to estimate the probabilities of such propositions, it needs to form and estimate probabilities for a host of other propositions – concrete ones involving its sensory observations and actions, and more abstract generalizations as well. Since precise probabilistic reasoning based on the total set of the system’s observations is infeasible, numerous heuristics are used alongside exact probability-theoretic calculations. Part of the system’s inferencing involves figuring out what subgoals may help it achieve its top-level goals in various contexts.

Exactly what set of top-level goals should be given to an OpenCog system aimed at advanced AGI is not yet fully clear and will largely be determined via experimentation with early-stage OpenCog systems, but a first approximation is as follows, determined via a combination of theoretical and pragmatic considerations. The first four values on the list are drawn from the Cosmist ethical analysis presented in my books A Cosmist Manifesto and The Hidden Pattern; the others are included for fairly obvious pragmatic reasons to do with the nature of early-stage AGI development and social integration. The order of the items on the list is arbitrary as given here; each OpenCog system would have a particular weighting for its top-level goals.

Joy: maximization of the amount of pleasure observed or estimated to be experienced by sentient beings across the universe

Growth: maximization of the amount of new pattern observed or estimated to be created throughout the universe

Choice: maximization of the degree to which sentient beings across the universe appear to be able to make choices (according e.g. to the notion of “natural autonomy”, a scientifically and rationally grounded analogue of the folk notion and subjective experience of “free will”)

Continuity: persistence of patterns over time. Obviously this is a counterbalance to Growth; the relative weighting of these two top-level goals will help determine the “conservatism” of a particular OpenCog system with the goal-set indicated here.

Novelty: the amount of new information in the system’s perceptions, actions and thoughts

Human pleasure and fulfillment: How much do humans, as a whole, appear to be pleased and fulfilled?

Human pleasure regarding the AGI system itself: How pleased do humans appear to be with the AGI system, and their interactions with it?

Self-preservation: a goal fulfilled if the system keeps itself “alive.” This is actually somewhat subtle for a digital system. It could be defined in a copying-friendly way, as preservation of the existence of sentiences whose mind-patterns have evolved from the mind-patterns of the current system this with a reasonable degree of continuity.

This list of goals has a certain arbitrariness to it, and no doubt will evolve as OpenCog systems are experimented with. However, it comprises a reasonable “first stab” at a “roughly human-like” set of goal-content for an AGI system.

One might wonder how such goals would be specified for an AGI system. Does one write source-code that attempts to embody some mathematical theory of continuity, pleasure, joy, etc.? For some goals mathematical formulae may be appropriate, e.g. novelty which can be gauged information-theoretically in a plausible way. In most cases, though, I suspect the best way to define a goal for an AGI system will be using natural human language. Natural language is intrinsically ambiguous, but so are human values, and these ambiguities are closely coupled and intertwined. Even where a mathematical formula is given, it might be best to use natural language for the top-level goal, and supply the mathematical formula as an initial suggest means of achieving the NL-specified goal.

The AGI would need to be instructed – again, most likely in natural language – not to obsess on the specific wording supplied to it in its top-level goals, but rather to take the wording of its goals as indicative of general concepts that exist in human culture and can be expressed only approximatively in concise sequences of words. The specification of top-level goal content is not intended to precisely direct the AGIs behavior in the way that, say, a thermostat is directed by the goal of keeping temperature within certain bounds. Rather, it is intended to point the AGI’s self-organizing activity in certain informally-specified directions.

Alongside explicitly goal-oriented activity, OpenCog also includes “background processing” – cognition simply aimed at learning new knowledge, and forgetting relatively unimportant knowledge. This knowledge provides background information useful for reasoning regarding goal-achievement, and also builds up a self-organizing, autonomously developing body of active information that may sometimes lead a system in unpredictable directions – for instance, to reinterpretation of its top-level goals.

The goals supplied to an OpenCog system by its programmers are best viewed as initial seeds around which the system forms its goals. For instance, a top-level goal of “novelty” may be specified as a certain mathematical formula for calculating the novelty of the system’s recent observations, actions and thoughts. However, this mathematical formula may be intractable in its most pure and general form, leading the system to develop various context-specific approximations to estimate the novelty experienced in different situations. These approximations, rather than the top-level novelty formula, will be what the system actually works to achieve. Improving these approximations will be part of the system’s activity, but how much attention to pay to improving these approximations will be a choice the system has to make as part of its thinking process. Potentially, if the approximations are bad, they might cause the system to delude itself that it is experiencing novelty (according to its top-level equation) when it actually isn’t, and also tell the system that there is no additional novelty to be found in in improving its novelty estimation formulae.

And this same sort of problem could occur with goals like “help cause people to be pleased and fulfilled.” Subgoals of the top-level goal may be created via more or less crude approximations; and these subgoals may influence how much effort goes into improving the approximations. Even if the system is wired to put a fixed amount of effort into improving its estimations regarding which subgoals should be pursued in pursuit of its top-level goals, the particular content of the subgoals will inevitably influence the particulars of how the system goes about improving these estimations.

The flexibility of an OpenCog system, its ability to ongoingly self-organize, learn and develop, brings the possibility that it could deviate from its in-built top-level goals in complex and unexpected ways. But this same flexibility is what should – according to the design intention – allow an OpenCog system to effectively absorb the complexity of human values. Via interacting with humans in rich ways – not just via getting reinforced on the goodness or badness of its actions (though such reinforcement will impact the system assuming it has goals such as “help cause human pleasure and fulfillment”), but via all sorts of joint activity with humans – the system will absorb the ins and outs of human psychology, culture and value. It will learn subgoals that approximately imply its top-level goals, in a way that fits with human nature, and with the specific human culture and community it’s exposed to as it grows.

In the above I have been speaking as if an OpenCog system is ongoingly stuck with the top-level goals that its human programmers have provided it with; but this is not necessarily the case. Operationally it is unproblematic to allow an OpenCog system to modify its top-level goals. One might consider this undesirable, yet a reflection on the uncertainty and ignorance necessarily going into any choice of goal-set may make one think otherwise.

A highly advanced intelligence, forced by design to retain top-level goals programmed by minds much more primitive than itself, could develop an undesirably contorted psychology, based on internally working around its fixed goal programming. Examples of this sort of problem are replete in human psychology. For instance, we humans are “programmed” with a great deal of highly-weighted goal content relevant to reproduction, sexuality and social status, but the more modern aspects of our minds have mixed feelings about these archaic evolved goals. But it is very hard for us to simply excise these historical goals from our minds. Instead we have created quite complex and subtle psychological and social patterns that indirectly and approximatively achieve the archaic goals encoded in our brains, while also letting us go in the directions in which our minds and cultures have self-organized during recent millennia. Hello Kitty, romantic love, birth control, athletic competitions, investment banks – the list of human-culture phenomena apparently explicable in this way is almost endless.

One key point to understand, closely relevant to the VLT, is that the foundation of OpenCog’s dynamics in explicit probabilistic inference will necessarily cause it to diverge somewhat from human judgments. As a probabilistically grounded system, OpenCog will naturally try to accurately estimate the probability of each abstraction it makes actually applying in each context it deems relevant. Humans sometimes do this – otherwise they wouldn’t be able to survive in the wild, let alone carry out complex activities like engineering computers or AI systems – but they also behave quite differently at times. Among other issues, humans are strongly prone to “wishful thinking” of various sorts. If one were to model human reasoning using a logical formalism, one might end up needing to include a rule of the rough form

X would imply achievement of my goals

therefore

X’s truth value gets boosted

Of course, a human being who applied this rule strongly to all X in its mind, would become completely delusional and dysfunctional. No human is like that. But this sort of wishful thinking infuses human minds, alongside serious attempts at accurate probabilistic reasoning, plus various heuristics which have various well-documented systematic biases. Belief revision combines conclusions drawn via wishful thinking, with conclusions drawn by attempts at accurate inference, in complex and mainly unconscious ways.

Some of the biases of human cognition are sensible consequences of trying to carry out complex probabilistic reasoning on complex data using limited space and time resources. Others are less “forgivable” and appear to exist in the human psyche for “historical reasons”, e.g. because they were adaptive for some predecessor of modern humanity in some contexts and then just stuck around.

An advanced OpenCog AGI system, if thoroughly embedded in human society and infused with human values, would likely arrive at its own variation of human values, differing from nearly any human being’s particular value system in its bias toward logical and probabilistic consistency. The closest approximation to such an OpenCog system’s value system might be the values of a human belonging to the human culture in which the OpenCog system was embedded, and who also had made great efforts to remove any (conscious or unconscious) logical inconsistencies in his value system.

What does this speculative scenario have to say about the VLT and VET?

Firstly, it seems to support a limited version of the VLT. An OpenCog system, due to its fundamentally different cognitive architecture, is not likely to inherit the logical and probabilistic inconsistencies of any particular human being’s value system. Rather, one would expect it to (implicitly and explicitly) seek to find the best approximation to the value system of its human friends and teachers, within the constraint of approximate probabilistic/logical consistency that is implicit in its architecture.

The precise nature of such a value system cannot be entirely clear at this moment, but is certainly an interesting topic for speculative thinking. First of all, it is fairly clear which sorts of properties of typical human value systems would not be inherited by an OpenCog of this hypothetical nature. For instance, humans have a tendency to place a great deal of extra value on goods or ills that occur in their direct sensory experience, much beyond what would be justified by the increased confidence associated with direct experience as opposed to indirect experience. Humans tend to value feeding a starving child sitting right in front of them, vastly more than feeding a starving child halfway across the world. One would not expect an reasonably consistent human-like value system to display this property.

Similarly, humans tend to be much more concerned with goods or ills occurring to individuals who share more properties with themselves – and the choice of which properties to weight more highly in this sort of judgment is highly idiosyncratic and culture-specific. If an OpenCog system doesn’t have a top-level goal of “preserving patterns similar to the ones detected in my own mind and body”, then it would not be expected to have the same “tribal” value-system bias that humans tend to have. Some level of “tribal” value bias can be expected to emerge via abductive reasoning based on the goal of self-preservation (assuming this goal is included), but it seems qualitatively that humans have a much more tribally-oriented value system than could be derived via this sort of indirect factor alone. Humans evolved partially via tribe-level group selection; an AGI need not do so, and this would be expected to lead to significant value-system differences.

Overall, one might reasonably expect an OpenCog created with the above set of goals and methodology of embodiment and instruction to arrive at a value system that is roughly human-like, but without the glaring inconsistencies plaguing most practical human value systems. Many of the contradictory aspects of human values have to do with conflict between modern human culture and “historical” values that modern humans have carried over from early human history (e.g. tribalism). One may expect that, in the AGI’s value system, the modern culture side of such dichotomies will generally win out – because it is what is closer to the surface in observed human behavior and hence easier to detect and reason about, and also because it is more consilient with the explicitly Cosmist values (Joy, Growth, Choice) in the proposed first-pass AGI goal system.

So to a first approximation, one might expect an OpenCog system of this nature to settle into a value system that

Resembles the human values of the individuals who have instructed and interacted with it

Displays a strong (but still just approximate) logical and probabilistic consistency and coherence

Generally resolves contradictions in human values via selecting modern-culture value aspects over “archaic” historical value aspects

It seems likely that such a value system would generally be acceptable to human participants in modern culture who value logic, science and reason (alongside other human values). Obviously human beings who prefer the more archaic aspects of human values, and consider modern culture largely an ethical and aesthetic degeneration, would tend to be less happy with this sort of value system.

So in this view, an advanced OpenCog system appropriately architected and educated would validate the VLT, but with a moderately loose interpretation. Its value system would be in the broad scope of human-like value systems, but with a particular bias and with a kind of consistency and purity not likely present in any particular human being’s value system.

What about the VET? It seems intuitively likely that the ongoing growth and development of an OpenCog system as described above would parallel the growth and development of human uploads, cyborgs or biologically-enhanced humans who were, in the early stage of their posthuman evolution, specifically concerned with reducing their reliance on archaic values and increasing their coherence and logical and probabilistic consistency. Of course, this category might not include all posthumans – e.g. some religious humans, given the choice, might use advanced technology to modify their brains to cause themselves to become devout in their particular religion to a degree beyond all human limits. But it would seem that an OpenCog system as described above would be likely to evolve toward superhumanity in roughly the same direction as a human being with transhumanist proclivities and a roughly Cosmist outlook. If indeed this is the case, it would validate the VET, at least in this particular sort of situation.

It will certainly be noted that the value system of “a human being with transhumanist proclivities and a Cosmist outlook” is essentially the value system of the author of this article, and the author of the first-pass, roughly sketched OpenCog goal content used as the basis of the discussion here. Indeed, the goal system outlined above is closely matched to my own values. For instance, I tend toward technoprogressivism as opposed to transhumanist political libertarianism – and this is reflected in my inclusion of values related to the well-being of all sentient beings, and lack of focus on values regarding private property.

In fact, different weightings of the goals in the above-given goal-set would be expected to lead to different varieties of human-level and superhuman AGI value system – some of which would be more “technoprogressivist” in nature and some more “political libertarian” in nature, among many other differences. In a cosmic sense, though, this sort of difference is ultimately fairly minor. These are all variations of modern human value system, and occupy a very small region in the space of all possible value systems that could be adopted by intelligences in our universe. Differences between different varieties of human value system often feel very important to us now, but may well appear quite insignificant to our superintelligent descendants.