What Can Learned Intrinsic Rewards Capture?

Authors: Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado Van Hasselt, David Silver, Satinder Singh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the results from our empirical investigations in two sections. We investigate these research questions in the grid-world domains illustrated in Figure 2.
Researcher Affiliation Collaboration 1University of Michigan 2Deep Mind. Correspondence to: Zeyu Zheng <zeyu@umich.edu>, Junhyuk Oh <junhyuk@google.com>.
Pseudocode Yes Algorithm 1 Learning intrinsic rewards
Open Source Code No No explicit statement about providing open-source code or a link to a repository was found in the paper.
Open Datasets No We investigate these research questions in the grid-world domains illustrated in Figure 2. For each domain, we trained an intrinsic reward function across many lifetimes and evaluated it by training an agent using the learned reward.
Dataset Splits No No explicit mention of traditional training, validation, or test dataset splits (e.g., percentages or counts) was found, as the experiments involve interactive learning within simulated environments over lifetimes and episodes.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the main paper.
Software Dependencies No No specific software dependencies with version numbers were explicitly mentioned in the main text of the paper.
Experiment Setup No The details of implementation and hyperparameters are described in the supplementary material.