Are Humans ‘Human Compatible’?

Human Compatible

I’ve just spent the last three days reading Stuart Russell’s new book on AI safety, ‘Human Compatible’. To be fair I didn’t read continuously for three days, this is because the book rewards thoughtful pauses to walk or drink coffee, because it nurtures reflection about what really matters.

You see, Russell has written a book about AI for social scientists, that is also a book about social science for AI engineers, while at the same time providing the conceptual framework to bring us all ‘provably beneficial AI’.

‘Human Compatible’ is necessarily a whistle-stop tour of very diverse but interdependent thinking across computer science, philosophy and the social sciences and I am recommending that all AI practitioners, technology policymakers, and social scientists read it.

The problem

The key elements of the book are as follows:

  • No matter how defensive some AI practitioners get, we need to all agree there are risks inherent in the development of systems that will outperform us
  • Chief among these risks is the concern that AI systems will achieve exactly the goals that we set them, even if in some cases we’d prefer if they hadn’t
  • Human preferences are complex, contextual, and change over time
  • Given the foregoing, we must avoid putting goals ‘in the machine’, but rather build systems that consult us appropriately about our preferences.

Russell argues the case for all these points. The argument is informed by an impressive and important array of findings from philosophy, psychology, behavioural economics, and game theory, among other disciplines.

A key problem as Russell sees it, is that most present day technology optimizes a ‘fixed externally supplied objective’, but this raises issues of safety if the objective is not fully specified (which it can never be), and if the system is not easily reset (which is plausible for a range of AI systems).

The solution

Russell’s solution is that ‘provably beneficial AI’ will be engineered according to three guidelines:

  1. The machine’s only objective is to maximize the realization of human preferences
  2. The machine is initially uncertain about what those preferences are
  3. The ultimate source of information about human preferences is human behaviour

There are some mechanics that can be deployed to achieve such design. These include game theory, utilitarian ethics, and an understanding of human psychology. Machines must defer to humans regularly, ask permission, and their programming will explicitly allow for the machines to be wrong and therefore be open to being switched off.

Agree with Russell or disagree, he has provided a framework to which disparate parties can now refer, a common language and usable concepts accessible to those from all disciplines to progress the AI safety dialogue.

If you think that goals should be hard-coded, then you must point out why Russell’s warnings about fixed goals are mistaken. If you think that human preferences can always be predicted, then you must explain why centuries of social science research is flawed. And be aware that Russell preempts many of the inadequate slogan-like responses to these concerns.

I found an interesting passage late in the book where the argument is briefly extended from machines to political systems. We vote every few years on a government (expressing our preferences). Yet the government then acts unilaterally (according to its goals) until the next election. Russell is disparaging of this process whereby ‘one byte of information’ is contributed by each person every few years. One can infer that he may also disapprove of the algorithms of large corporate entities with perhaps 2 billion users acting autonomously on the basis of ‘one byte’ of agreement with blanket terms and conditions.

Truly ‘human compatible’ AI will ask us regularly what we want, and then provide that to us, checking to make sure it has it right. It will not dish up solutions to satisfy a ‘goal in the machine’ which may not align with current human interests.

What do we want to want?

The book makes me think that we need to be aware that machines will be capable of changing our preferences (we already experience this with advertising) and indeed machines may do so in order to more easily satisfy the ‘goals in the machine’ (think of online engagement and recommendation engines). It seems that we (thanks to machines) are now capable of shaping our environment (digital or otherwise) in such a way that we can shape the preferences of people. Ought this be allowed?

We must be aware of this risk. If you prefer A to B, and are made to prefer B, then how is this permitted? As Russell notes, would it ever make sense for someone to choose to switch from preferring A to preferring B, given that they currently prefer A?

This point actually runs very deep and a lot more philosophical thought needs to be deployed here. If we can build machines that can get us what we want, but we can also build machines that can change what we want, then we need to figure out an answer to the following deeply thought-provoking question, posed by Yuval Noah Harari at the end of his book ‘Sapiens’: ‘What do we want to want?’ There is no dismissive slogan answer to this problem.

What ought intelligence be for?

In the present context we are using ‘intelligence’ to refer to the operation of machines, but in a mid-2018 blog I posed the question what ought intelligence be used for? The point being that we are now debating how we ought to deploy AI, but what uses of other kinds of intelligence are permissible?

The process of developing and confronting an intelligence other than our own is cause for some self-reflexive thought. If there are certain features and uses of an artificial intelligence that we wouldn’t permit, then how are we justified in permitting similar goals and methods of humans? If Russell’s claims that we should want altruistic AI have any force, then why do we permit non-altruistic human behaviour?

Are humans ‘human compatible’?

I put down this book agreeing that we need to control AI (and indeed we can, according to Russell, with good engineering). But if intelligence is intelligence is intelligence then must we necessarily turn to humans, and constrain them in the same way so that humans don’t pursue ‘goals inside the human’ that are significantly at odds with ‘our’ preferences?

The key here is defining ‘our’. Whose preferences matter? There is a deep and complex history of moral and political philosophy addressing this question, and AI developers would do well to familiarise themselves with key aspects of it. As would corporations, as would policymakers. Intelligence has for too long been used poorly.

Russell notes that many AI practitioners strongly resist regulation and may feel threatened when non-technical influences encroach on ‘their’ domain. But the deep questions above, coupled with the risks inherent due to ‘goals in the machine’, require an informed and collaborative approach to beneficial AI development. Russell is an accomplished AI practitioner speaking on behalf of philosophers to AI scientists, but hopefully this book will speak to everyone.

%d bloggers like this: