Distributional Bellman Operators over Mean Embeddings
Authors: Li Kevin Wenliang, Gregoire Deletang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide asymptotic convergence theory using a novel error analysis approach, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning to give competitive performances. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2Gatsby Unit, University College London. |
| Pseudocode | Yes | Algorithm 1 Sketch-DP/Sketch-TD |
| Open Source Code | Yes | Code is available at https://github.com/ google-deepmind/sketch_dqn. |
| Open Datasets | Yes | Figure 5 shows the mean and median human-normalised performance on the Atari suite of environments (Bellemare et al., 2013) across 200M training frames |
| Dataset Splits | No | The paper mentions training data, but does not explicitly provide details about training/validation/test splits for reproducibility. |
| Hardware Specification | Yes | with each agent running on a single V100 GPU. |
| Software Dependencies | No | The paper mentions "Sci Py s MINIMIZE algorithm (Virtanen et al., 2020)", but does not list multiple key software components with specific version numbers for comprehensive reproducibility. |
| Experiment Setup | Yes | The results in Figure 5 uses the sigmoid base feature κ(x) = 1/(e x + 1) with slope s = 5, and the anchors to be 401 (tuned from 101, 201 and 401) evenly spaced points between 12 and 12. ... We use the exact same training procedure as QR-DQN (Dabney et al., 2018b). Notably, the learning rate, exploration schedule, buffer design are all the same. We tried a small hyperparameter sweep on the learning rate, and found the default learning rate 0.00005 to be optimal for performance taken at 200 million frames. |