Distributional Reinforcement Learning via Moment Matching

Authors: Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh9144-9152

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the suite of Atari games show that our method outperforms the distributional RL baselines and sets a new record in the Atari games for non-distributed agents. Experimental Results We first present results with a tabular version of MMDRL to illustrate its behaviour in distribution approximation task. We then combine the MMDRL update to the DQN-style architecture to create a novel deep RL algorithm namely MMDQN, and evaluate it on the Atari-57 games.
Researcher Affiliation Academia Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh Applied Artificial Intelligence Institute (A2I2), Deakin University, Australia
Pseudocode Yes Algorithm 1: Generic MMDRL update
Open Source Code Yes Our official code is available at https://github.com/ thanhnguyentang/mmdrl.
Open Datasets Yes We evaluated our algorithm on 55 3 Atari 2600 games (Bellemare et al. 2013) following the standard training and evaluation procedures (Mnih et al. 2015; van Hasselt, Guez, and Silver 2016)
Dataset Splits No The paper mentions following 'standard training and evaluation procedures (Mnih et al. 2015; van Hasselt, Guez, and Silver 2016)' but does not explicitly state specific training/validation/test dataset splits, percentages, or counts within its text.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Open AI Gym' and the 'Dopamine framework' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For fair comparison with QR-DQN, we used the same hyperparameters: N = 200, Adam optimizer (Kingma and Ba 2015) with lr = 0.00005, ϵADAM = 0.01/32. We used ϵ-greedy policy with ϵ being decayed at the same rate as in DQN but to a lower value ϵ = 0.01 as commonly used by the distributional RL methods. We used a target network to compute the distributional Bellman target as with DQN.