Inverse Reward Design

Authors: Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J. Russell, Anca Dragan

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results suggest that this approach can help alleviate negative side effects of misspecified reward functions and mitigate reward hacking. We evaluated our approaches in a model of the scenario from Figure 1 that we call Lavaland. Our system designer, Alice, is programming a mobile robot, Rob. We model this as a gridworld with movement in the four cardinal directions and four terrain types: target, grass, dirt, and lava. The true objective for Rob, w , encodes that it should get to the target quickly, stay off the grass, and avoid lava. Alice designs a proxy that performs well in a training MDP that does not contain lava. Then, we measure Rob s performance in a test MDP that does contain lava. Our results show that combining IRD and risk-averse planning creates incentives for Rob to avoid unforeseen scenarios.
Researcher Affiliation Collaboration Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94709 {dhm, smilli, pabbeel, russell, anca}@cs.berkeley.edu Open AI, International Computer Science Institute (ICSI)
Pseudocode No The paper describes methods using text and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository for the described methodology.
Open Datasets No The paper describes the creation of its experimental environment ("Lavaland") and sampling of data within it, but does not use or provide concrete access information for a publicly available or open dataset.
Dataset Splits No The paper describes the composition of its training and testing environments and data generation details but does not provide specific information about dataset validation splits (e.g., percentages, counts, or methodology for a validation set).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup No The paper mentions some parameters like 1000 examples and 50 dimensions for feature vectors but does not provide specific hyperparameter values, training configurations, or system-level settings (e.g., learning rates, batch sizes, optimizers) in the main text; further details are deferred to supplementary material.