reproducibilityindex.ai

Composing Value Functions in Reinforcement Learning

Authors: Benjamin Van Niekerk, Steven James, Adam Earle, Benjamin Rosman

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate composition, we perform a series of experiments in a high-dimensional video game (Figure 1b).Results show that an agent is able to compose existing policies learned from high-dimensional pixel input to generate new, optimal behaviours.
Researcher Affiliation	Academia	1School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa 2Council for Scientiﬁc and Industrial Research, Pretoria, South Africa.
Pseudocode	Yes	Algorithm 1 Soft Value Iteration, Algorithm 2 Soft Policy Iteration
Open Source Code	No	The paper does not provide a link to source code or explicitly state that source code for the methodology is being released.
Open Datasets	No	The paper describes custom tasks within a video game domain developed for the experiments, but does not provide access information for a publicly available dataset. It states 'We construct a number of different tasks based on the objects that the agent must collect'.
Dataset Splits	No	The paper describes training and evaluation on a custom video game environment (e.g., 'Each network is trained for 1.5m timesteps', 'Returns from 50k episodes'), but does not specify explicit training, validation, or test dataset splits.
Hardware Specification	No	No specific hardware details (such as GPU or CPU models, or cloud computing specifications) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions 'soft) deep Q-learning' but does not specify any software names with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	Each network is trained for 1.5m timesteps to ensure near-optimal convergence. The input to our network is a single RGB frame of size 84 84, which is passed through three convolutional layers and two fully-connected layers before outputting the predicted Q-values for the given state.