Inverse Reward Design
Authors: Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J. Russell, Anca Dragan
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results suggest that this approach can help alleviate negative side effects of misspeciļ¬ed reward functions and mitigate reward hacking. We evaluated our approaches in a model of the scenario from Figure 1 that we call Lavaland. Our system designer, Alice, is programming a mobile robot, Rob. We model this as a gridworld with movement in the four cardinal directions and four terrain types: target, grass, dirt, and lava. The true objective for Rob, w , encodes that it should get to the target quickly, stay off the grass, and avoid lava. Alice designs a proxy that performs well in a training MDP that does not contain lava. Then, we measure Rob s performance in a test MDP that does contain lava. Our results show that combining IRD and risk-averse planning creates incentives for Rob to avoid unforeseen scenarios. |
| Researcher Affiliation | Collaboration | Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94709 {dhm, smilli, pabbeel, russell, anca}@cs.berkeley.edu Open AI, International Computer Science Institute (ICSI) |
| Pseudocode | No | The paper describes methods using text and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository for the described methodology. |
| Open Datasets | No | The paper describes the creation of its experimental environment ("Lavaland") and sampling of data within it, but does not use or provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes the composition of its training and testing environments and data generation details but does not provide specific information about dataset validation splits (e.g., percentages, counts, or methodology for a validation set). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions some parameters like 1000 examples and 50 dimensions for feature vectors but does not provide specific hyperparameter values, training configurations, or system-level settings (e.g., learning rates, batch sizes, optimizers) in the main text; further details are deferred to supplementary material. |