reproducibilityindex.ai

Distributional Reinforcement Learning via Moment Matching

Authors: Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh9144-9152

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the suite of Atari games show that our method outperforms the distributional RL baselines and sets a new record in the Atari games for non-distributed agents. Experimental Results We ﬁrst present results with a tabular version of MMDRL to illustrate its behaviour in distribution approximation task. We then combine the MMDRL update to the DQN-style architecture to create a novel deep RL algorithm namely MMDQN, and evaluate it on the Atari-57 games.
Researcher Affiliation	Academia	Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh Applied Artiﬁcial Intelligence Institute (A2I2), Deakin University, Australia
Pseudocode	Yes	Algorithm 1: Generic MMDRL update
Open Source Code	Yes	Our ofﬁcial code is available at https://github.com/ thanhnguyentang/mmdrl.
Open Datasets	Yes	We evaluated our algorithm on 55 3 Atari 2600 games (Bellemare et al. 2013) following the standard training and evaluation procedures (Mnih et al. 2015; van Hasselt, Guez, and Silver 2016)
Dataset Splits	No	The paper mentions following 'standard training and evaluation procedures (Mnih et al. 2015; van Hasselt, Guez, and Silver 2016)' but does not explicitly state specific training/validation/test dataset splits, percentages, or counts within its text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Open AI Gym' and the 'Dopamine framework' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For fair comparison with QR-DQN, we used the same hyperparameters: N = 200, Adam optimizer (Kingma and Ba 2015) with lr = 0.00005, ϵADAM = 0.01/32. We used ϵ-greedy policy with ϵ being decayed at the same rate as in DQN but to a lower value ϵ = 0.01 as commonly used by the distributional RL methods. We used a target network to compute the distributional Bellman target as with DQN.