reproducibilityindex.ai

Compress and Control

Authors: Joel Veness, Marc Bellemare, Marcus Hutter, Alvin Chua, Guillaume Desjardins

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we ﬁnd that our technique provides sufﬁciently accurate value estimates for effective on-policy control.
Researcher Affiliation	Collaboration	Joel Veness, Marc G. Bellemare, Marcus Hutter, Alvin Chua, Guillaume Desjardins Google Deep Mind, Australian National University {veness,bellemare,alschua,gdesjardins}@google.com marcus.hutter@anu.edu.au
Pseudocode	Yes	Algorithm 1 CNC POLICY EVALUATION
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Our ﬁrst experiment involves a simpliﬁed version of the game of Blackjack (Sutton and Barto 1998, Section 5.1). We evaluated CNC using ALE, the Arcade Learning Environment (Bellemare et al. 2013), a reinforcement learning interface to the Atari 2600 video game platform.
Dataset Splits	No	The paper describes hyperparameter optimization but does not specify explicit training/validation/test dataset splits, which are not typical for reinforcement learning environments where agents interact directly with the environment.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, or memory) used to run the experiments, only mentioning the use of the Stella Atari 2600 emulator for the environment.
Software Dependencies	No	The paper mentions several software components, estimators, and algorithms such as ALE, SAD estimator, logistic regression, ADAGRAD, LEMPEL-ZIV, and SKIPCTS, but it does not specify concrete version numbers for any of these to ensure reproducible setup.
Experiment Setup	Yes	The exploration rate ϵ was initialized to 1.0, then decayed linearly to 0.02 over the course of 200,000 time steps. The horizon was set to m = 80 steps, corresponding to roughly 5 seconds of play. The agents were evaluated over 10 trials, each lasting 2 million steps. The hyperparameters (including learning rate, choice of context, etc.) were optimized via the random sampling technique of Bergstra and Bengio (2012).