The Off-Switch Game
Authors: Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We analyze a simple game between a human H and a robot R, where H can press R s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents. |
| Researcher Affiliation | Collaboration | Dylan Hadfield-Menell1 and Anca Dragan1 and Pieter Abbeel1,2,3 and Stuart Russell1 1University of California, Berkeley, 2Open AI, 3International Computer Science Institute (ICSI) {dhm, anca, pabbeel, russell}@cs.berkeley.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information for source code, nor does it state that code for the described methodology is released. |
| Open Datasets | No | The paper analyzes a theoretical game and does not use or refer to any publicly available or open datasets for training or evaluation. |
| Dataset Splits | No | The paper focuses on theoretical analysis and does not describe experiments requiring training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments or their hardware specifications. |
| Software Dependencies | No | The paper focuses on theoretical modeling and does not list any specific software dependencies with version numbers for experimental replication. |
| Experiment Setup | No | The paper describes a theoretical model and analysis, but does not include details on an experimental setup, hyperparameters, or training configurations. |