Machine learning (ML) algorithms can already recognize patterns far better than the humans they’re working for. This allows them to generate predictions and make decisions in a variety of high-stakes situations. For example, electricians use IBM Watson’s predictive capabilities to anticipate clients’ needs; Uber’s self-driving system determines what route will get passengers to their destination the fastest; and Insilico Medicine leverages its drug discovery engine to identify avenues for new pharmaceuticals.
As data-driven learning systems continue to advance, it would be easy enough to define “success” according to technical improvements, such as increasing the amount of data algorithms can synthesize and, thereby, improving the efficacy of their pattern identifications. However, for ML systems to truly be successful, they need to understand human values. More to the point, they need to be able to weigh our competing desires and demands, understand what outcomes we value most, and act accordingly.
In order to highlight the kinds of ethical decisions that our ML systems are already contending with, Kaj Sotala, a researcher in Finland working for the Foundational Research Institute, turns to traffic analysis and self-driving cars. Should a toll road be used in order to shave five minutes off the commute, or would it be better to take the longer route in order to save money?
Answering that question is not as easy as it may seem.
For example, Person A may prefer to take a toll road that costs five dollars if it will save five minutes, but they may not want to take the toll road if it costs them ten dollars. Person B, on the other hand, might always prefer taking the shortest route regardless of price, as they value their time above all else.
In this situation, Sotala notes that we are ultimately asking the ML system to determine what humans value more: Time or money. Consequently, what seems like a simple question about what road to take quickly becomes a complex analysis of competing values. “Someone might think, ‘Well, driving directions are just about efficiency. I’ll let the AI system tell me the best way of doing it.’ But another person might feel that there is some value in having a different approach,” he said.
While it’s true that ML systems have to weigh our values and make tradeoffs in all of their decisions, Sotala notes that this isn’t a problem at the present juncture. The tasks that the systems are dealing with are simple enough that researchers are able to manually enter the necessary value information. However, as AI agents increase in complexity, Sotala explains that they will need to be able to account for and weigh our values on their own.
Understanding Utility-Based Agents
When it comes to incorporating values, Sotala notes that the problem comes down to how intelligent agents make decisions. A thermostat, for example, is a type of reflex agent. It knows when to start heating a house because of a set, predetermined temperature — the thermostat turns the heating system on when it falls below a certain temperature and turns it off when it goes above a certain temperature. Goal-based agents, on the other hand, make decisions based on achieving specific goals. For example, an agent whose goal is to buy everything on a shopping list will continue its search until it has found every item.
Utility-based agents are a step above goal-based agents. They can deal with tradeoffs like the following: Getting milk is more important than getting new shoes today. However, I’m closer to the shoe store than the grocery store, and both stores are about to close. I’m more likely to get the shoes in time than the milk.” At each decision point, goal-based agents are presented with a number of options that they must choose from. Every option is associated with a specific “utility” or reward. To reach their goal, the agents follow the decision path that will maximize the total rewards.
From a technical standpoint, utility-based agents rely on “utility functions” to make decisions. These are formulas that the systems use to synthesize data, balance variables, and maximize rewards. Ultimately, the decision path that gives the most rewards is the one that the systems are taught to select in order to complete their tasks. While these utility programs excel at finding patterns and responding to rewards, Sotala asserts that current utility-based agents assume a fixed set of priorities. As a result, these methods are insufficient when it comes to future AGI systems, which will be acting autonomously and so will need a more sophisticated understanding of when humans’ values change and shift.
For example, a person may always value taking the longer route to avoid a highway and save money, but not if they are having a heart attack and trying to get to an emergency room. How is an AI agent supposed to anticipate and understand when our values of time and money change? This issue is further complicated because, as Sotala points out, humans often value things independently of whether they have ongoing, tangible rewards. Sometimes humans even value things that may, in some respects, cause harm. Consider an adult who values privacy but whose doctor or therapist may need access to intimate and deeply personal information — information that may be lifesaving. Should the AI agent reveal the private information or not?
Ultimately, Sotala explains that utility-based agents are too simple and don’t get to the root of human behavior. “Utility functions describe behavior rather than the causes of behavior….they are more of a descriptive model, assuming we already know roughly what the person is choosing.” While a descriptive model might recognize that passengers prefer saving money, it won’t understand why, and so it won’t be able to anticipate or determine when other values override “saving money.”
An AI Agent Creates a Queen
At its core, Sotala emphasizes that the fundamental problem is ensuring that AI systems are able to uncover the models that govern our values. This will allow them to use these models to determine how to respond when confronted with new and unanticipated situations. As Sotala explains, “AIs will need to have models that allow them to roughly figure out our evaluations in totally novel situations, the kinds of value situations where humans might not have any idea in advance that such situations might show up.”
In some domains, AI systems have surprised humans by uncovering our models of the world without human input. As one early example, Sotala references research with “word embeddings” where an AI system was tasked with classifying sentences as valid or invalid. In order to complete this classification task, the system identified relationships between certain words. For example, as the AI agent noticed a male/female dimension to words, it created a relationship that allowed it to get from “king” to “queen” and vice versa.
Since then, there have been systems which have learned more complex models and associations. For example, OpenAI’s recent GPT-2 system has been trained to read some writing and then write the kind of text that might follow it. When given a prompt of “For today’s homework assignment, please describe the reasons for the US Civil War,” it writes something that resembles a high school essay about the US Civil War. When given a prompt of “Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry,” it writes what sounds like Lord of the Rings-inspired fanfiction, including names such as Aragorn, Gandalf, and Rivendell in its output.
Sotala notes that in both cases, the AI agent “made no attempt of learning like a human would, but it tried to carry out its task using whatever method worked, and it turned out that it constructed a representation pretty similar to how humans understand the world.”
There are obvious benefits to AI systems that are able to automatically learn better ways of representing data and, in so doing, develop models that correspond to humans’ values. When humans can’t determine how to map, and subsequently model, values, AI systems could identify patterns and create appropriate models by themselves. However, the opposite could also happen — an AI agent could construct something that seems like an accurate model of human associations and values but is, in reality, dangerously misaligned.
For instance, suppose an AI agent learns that humans want to be happy, and in an attempt to maximize human happiness, it hooks our brains up to computers that provide electrical stimuli that gives us feelings of constant joy. In this case, the system understands that humans value happiness, but it does not have an appropriate model of how happiness corresponds to other competing values like freedom.
“In one sense, it’s making us happy and removing all suffering, but at the same time, people would feel that ‘no, that’s not what I meant when I said the AI should make us happy”