reproducibilityindex.ai

The Off-Switch Game

Authors: Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We analyze a simple game between a human H and a robot R, where H can press R s off switch but R can disable the off switch. A traditional agent takes its reward function for granted: we show that such agents have an incentive to disable the off switch, except in the special case where H is perfectly rational. Our key insight is that for R to want to preserve its off switch, it needs to be uncertain about the utility associated with the outcome, and to treat H s actions as important observations about that utility. (R also has no incentive to switch itself off in this setting.) We conclude that giving machines an appropriate level of uncertainty about their objectives leads to safer designs, and we argue that this setting is a useful generalization of the classical AI paradigm of rational agents.
Researcher Affiliation	Collaboration	Dylan Hadﬁeld-Menell1 and Anca Dragan1 and Pieter Abbeel1,2,3 and Stuart Russell1 1University of California, Berkeley, 2Open AI, 3International Computer Science Institute (ICSI) {dhm, anca, pabbeel, russell}@cs.berkeley.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information for source code, nor does it state that code for the described methodology is released.
Open Datasets	No	The paper analyzes a theoretical game and does not use or refer to any publicly available or open datasets for training or evaluation.
Dataset Splits	No	The paper focuses on theoretical analysis and does not describe experiments requiring training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any computational experiments or their hardware specifications.
Software Dependencies	No	The paper focuses on theoretical modeling and does not list any specific software dependencies with version numbers for experimental replication.
Experiment Setup	No	The paper describes a theoretical model and analysis, but does not include details on an experimental setup, hyperparameters, or training configurations.