reproducibilityindex.ai

Distributional Reinforcement Learning With Quantile Regression

Authors: Will Dabney, Mark Rowland, Marc Bellemare, Rémi Munos

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now provide experimental results that demonstrate the practical advantages of minimizing the Wasserstein metric end-to-end, in contrast to the C51 approach. We use the 57 Atari 2600 games from the Arcade Learning Environment (ALE) (Bellemare et al. 2013).
Researcher Affiliation	Collaboration	Will Dabney Deep Mind Mark Rowland University of Cambridge Marc G. Bellemare Google Brain R emi Munos Deep Mind
Pseudocode	Yes	Algorithm 1 Quantile Regression Q-Learning
Open Source Code	No	No explicit statement about providing open-source code or a link to a code repository for the methodology was found.
Open Datasets	Yes	We use the 57 Atari 2600 games from the Arcade Learning Environment (ALE) (Bellemare et al. 2013).
Dataset Splits	No	No explicit training/test/validation dataset splits with percentages, sample counts, or citations to predefined splits are provided for any single dataset.
Hardware Specification	No	No specific hardware (GPU/CPU models, memory, or cloud instances with specs) used for running experiments is mentioned.
Software Dependencies	No	The paper mentions optimizer names (Adam, RMSProp) and deep learning frameworks (DQN architecture) but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We performed hyper-parameter tuning over a set of ﬁve training games and evaluated on the full set of 57 games using these best settings (α = 0.00005, ϵADAM = 0.01/32, and N = 200). As with DQN we use a target network when computing the distributional Bellman update. We also allow ϵ to decay at the same rate as in DQN, but to a lower value of 0.01, as is common in recent work (Bellemare, Dabney, and Munos 2017; Wang et al. 2016; van Hasselt, Guez, and Silver 2016). Out training procedure follows that of Mnih et al. (2015)s, and we present results under two evaluation protocols: best agent performance and online performance.