Settling the Reward Hypothesis
Authors: Michael Bowling, John D Martin, David Abel, Will Dabney
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The reward hypothesis posits that, all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward). We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hypothesis holds. Our work builds off this pair of insightful approaches by starting with preferences over histories... Altogether, our account does not give a simple affirmation or refutation of the reward hypothesis, but rather aims to completely specify the implicit requirements on goals and purposes under which the hypothesis holds. |
| Researcher Affiliation | Collaboration | 1Amii, University of Alberta 2Intel Labs 3Deep Mind. |
| Pseudocode | Yes | Algorithm 1 Reward and Discount Design Algorithm 2 Pref Sort Algorithm 3 Pref Scale |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not involve empirical training on datasets. It focuses on formal definitions and proofs. |
| Dataset Splits | No | The paper is theoretical and does not discuss empirical dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not present an experimental setup with hyperparameters or training settings. |