reproducibilityindex.ai

Settling the Reward Hypothesis

Authors: Michael Bowling, John D Martin, David Abel, Will Dabney

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	The reward hypothesis posits that, all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward). We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hypothesis holds. Our work builds off this pair of insightful approaches by starting with preferences over histories... Altogether, our account does not give a simple affirmation or refutation of the reward hypothesis, but rather aims to completely specify the implicit requirements on goals and purposes under which the hypothesis holds.
Researcher Affiliation	Collaboration	1Amii, University of Alberta 2Intel Labs 3Deep Mind.
Pseudocode	Yes	Algorithm 1 Reward and Discount Design Algorithm 2 Pref Sort Algorithm 3 Pref Scale
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not involve empirical training on datasets. It focuses on formal definitions and proofs.
Dataset Splits	No	The paper is theoretical and does not discuss empirical dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not conduct experiments, therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not present an experimental setup with hyperparameters or training settings.