reproducibilityindex.ai

Distributional Bellman Operators over Mean Embeddings

Authors: Li Kevin Wenliang, Gregoire Deletang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide asymptotic convergence theory using a novel error analysis approach, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning to give competitive performances.
Researcher Affiliation	Collaboration	1Google Deep Mind 2Gatsby Unit, University College London.
Pseudocode	Yes	Algorithm 1 Sketch-DP/Sketch-TD
Open Source Code	Yes	Code is available at https://github.com/ google-deepmind/sketch_dqn.
Open Datasets	Yes	Figure 5 shows the mean and median human-normalised performance on the Atari suite of environments (Bellemare et al., 2013) across 200M training frames
Dataset Splits	No	The paper mentions training data, but does not explicitly provide details about training/validation/test splits for reproducibility.
Hardware Specification	Yes	with each agent running on a single V100 GPU.
Software Dependencies	No	The paper mentions "Sci Py s MINIMIZE algorithm (Virtanen et al., 2020)", but does not list multiple key software components with specific version numbers for comprehensive reproducibility.
Experiment Setup	Yes	The results in Figure 5 uses the sigmoid base feature κ(x) = 1/(e x + 1) with slope s = 5, and the anchors to be 401 (tuned from 101, 201 and 401) evenly spaced points between 12 and 12. ... We use the exact same training procedure as QR-DQN (Dabney et al., 2018b). Notably, the learning rate, exploration schedule, buffer design are all the same. We tried a small hyperparameter sweep on the learning rate, and found the default learning rate 0.00005 to be optimal for performance taken at 200 million frames.