Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributional Bellman Operators over Mean Embeddings
Authors: Li Kevin Wenliang, Gregoire Deletang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide asymptotic convergence theory using a novel error analysis approach, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning to give competitive performances. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2Gatsby Unit, University College London. |
| Pseudocode | Yes | Algorithm 1 Sketch-DP/Sketch-TD |
| Open Source Code | Yes | Code is available at https://github.com/ google-deepmind/sketch_dqn. |
| Open Datasets | Yes | Figure 5 shows the mean and median human-normalised performance on the Atari suite of environments (Bellemare et al., 2013) across 200M training frames |
| Dataset Splits | No | The paper mentions training data, but does not explicitly provide details about training/validation/test splits for reproducibility. |
| Hardware Specification | Yes | with each agent running on a single V100 GPU. |
| Software Dependencies | No | The paper mentions "Sci Py s MINIMIZE algorithm (Virtanen et al., 2020)", but does not list multiple key software components with specific version numbers for comprehensive reproducibility. |
| Experiment Setup | Yes | The results in Figure 5 uses the sigmoid base feature Îș(x) = 1/(e x + 1) with slope s = 5, and the anchors to be 401 (tuned from 101, 201 and 401) evenly spaced points between 12 and 12. ... We use the exact same training procedure as QR-DQN (Dabney et al., 2018b). Notably, the learning rate, exploration schedule, buffer design are all the same. We tried a small hyperparameter sweep on the learning rate, and found the default learning rate 0.00005 to be optimal for performance taken at 200 million frames. |