Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributional Reinforcement Learning With Quantile Regression
Authors: Will Dabney, Mark Rowland, Marc Bellemare, Rémi Munos
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now provide experimental results that demonstrate the practical advantages of minimizing the Wasserstein metric end-to-end, in contrast to the C51 approach. We use the 57 Atari 2600 games from the Arcade Learning Environment (ALE) (Bellemare et al. 2013). |
| Researcher Affiliation | Collaboration | Will Dabney Deep Mind Mark Rowland University of Cambridge Marc G. Bellemare Google Brain R emi Munos Deep Mind |
| Pseudocode | Yes | Algorithm 1 Quantile Regression Q-Learning |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository for the methodology was found. |
| Open Datasets | Yes | We use the 57 Atari 2600 games from the Arcade Learning Environment (ALE) (Bellemare et al. 2013). |
| Dataset Splits | No | No explicit training/test/validation dataset splits with percentages, sample counts, or citations to predefined splits are provided for any single dataset. |
| Hardware Specification | No | No specific hardware (GPU/CPU models, memory, or cloud instances with specs) used for running experiments is mentioned. |
| Software Dependencies | No | The paper mentions optimizer names (Adam, RMSProp) and deep learning frameworks (DQN architecture) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We performed hyper-parameter tuning over a set of five training games and evaluated on the full set of 57 games using these best settings (α = 0.00005, ϵADAM = 0.01/32, and N = 200). As with DQN we use a target network when computing the distributional Bellman update. We also allow ϵ to decay at the same rate as in DQN, but to a lower value of 0.01, as is common in recent work (Bellemare, Dabney, and Munos 2017; Wang et al. 2016; van Hasselt, Guez, and Silver 2016). Out training procedure follows that of Mnih et al. (2015)s, and we present results under two evaluation protocols: best agent performance and online performance. |