The Off-Switch Game

Authors: Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We analyze a simple game between a human H and a robot R, where H can press R s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.
Researcher Affiliation Collaboration Dylan Hadfield-Menell1 and Anca Dragan1 and Pieter Abbeel1,2,3 and Stuart Russell1 1University of California, Berkeley, 2Open AI, 3International Computer Science Institute (ICSI) {dhm, anca, pabbeel, russell}@cs.berkeley.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information for source code, nor does it state that code for the described methodology is released.
Open Datasets No The paper analyzes a theoretical game and does not use or refer to any publicly available or open datasets for training or evaluation.
Dataset Splits No The paper focuses on theoretical analysis and does not describe experiments requiring training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any computational experiments or their hardware specifications.
Software Dependencies No The paper focuses on theoretical modeling and does not list any specific software dependencies with version numbers for experimental replication.
Experiment Setup No The paper describes a theoretical model and analysis, but does not include details on an experimental setup, hyperparameters, or training configurations.