Distributional Bellman Operators over Mean Embeddings

Authors: Li Kevin Wenliang, Gregoire Deletang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide asymptotic convergence theory using a novel error analysis approach, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning to give competitive performances.
Researcher Affiliation Collaboration 1Google Deep Mind 2Gatsby Unit, University College London.
Pseudocode Yes Algorithm 1 Sketch-DP/Sketch-TD
Open Source Code Yes Code is available at https://github.com/ google-deepmind/sketch_dqn.
Open Datasets Yes Figure 5 shows the mean and median human-normalised performance on the Atari suite of environments (Bellemare et al., 2013) across 200M training frames
Dataset Splits No The paper mentions training data, but does not explicitly provide details about training/validation/test splits for reproducibility.
Hardware Specification Yes with each agent running on a single V100 GPU.
Software Dependencies No The paper mentions "Sci Py s MINIMIZE algorithm (Virtanen et al., 2020)", but does not list multiple key software components with specific version numbers for comprehensive reproducibility.
Experiment Setup Yes The results in Figure 5 uses the sigmoid base feature κ(x) = 1/(e x + 1) with slope s = 5, and the anchors to be 401 (tuned from 101, 201 and 401) evenly spaced points between 12 and 12. ... We use the exact same training procedure as QR-DQN (Dabney et al., 2018b). Notably, the learning rate, exploration schedule, buffer design are all the same. We tried a small hyperparameter sweep on the learning rate, and found the default learning rate 0.00005 to be optimal for performance taken at 200 million frames.