reproducibilityindex.ai

What Can Learned Intrinsic Rewards Capture?

Authors: Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado Van Hasselt, David Silver, Satinder Singh

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the results from our empirical investigations in two sections. We investigate these research questions in the grid-world domains illustrated in Figure 2.
Researcher Affiliation	Collaboration	1University of Michigan 2Deep Mind. Correspondence to: Zeyu Zheng <zeyu@umich.edu>, Junhyuk Oh <junhyuk@google.com>.
Pseudocode	Yes	Algorithm 1 Learning intrinsic rewards
Open Source Code	No	No explicit statement about providing open-source code or a link to a repository was found in the paper.
Open Datasets	No	We investigate these research questions in the grid-world domains illustrated in Figure 2. For each domain, we trained an intrinsic reward function across many lifetimes and evaluated it by training an agent using the learned reward.
Dataset Splits	No	No explicit mention of traditional training, validation, or test dataset splits (e.g., percentages or counts) was found, as the experiments involve interactive learning within simulated environments over lifetimes and episodes.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the main paper.
Software Dependencies	No	No specific software dependencies with version numbers were explicitly mentioned in the main text of the paper.
Experiment Setup	No	The details of implementation and hyperparameters are described in the supplementary material.