Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributional Reinforcement Learning via Moment Matching
Authors: Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh9144-9152
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the suite of Atari games show that our method outperforms the distributional RL baselines and sets a new record in the Atari games for non-distributed agents. Experimental Results We first present results with a tabular version of MMDRL to illustrate its behaviour in distribution approximation task. We then combine the MMDRL update to the DQN-style architecture to create a novel deep RL algorithm namely MMDQN, and evaluate it on the Atari-57 games. |
| Researcher Affiliation | Academia | Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh Applied Artificial Intelligence Institute (A2I2), Deakin University, Australia |
| Pseudocode | Yes | Algorithm 1: Generic MMDRL update |
| Open Source Code | Yes | Our official code is available at https://github.com/ thanhnguyentang/mmdrl. |
| Open Datasets | Yes | We evaluated our algorithm on 55 3 Atari 2600 games (Bellemare et al. 2013) following the standard training and evaluation procedures (Mnih et al. 2015; van Hasselt, Guez, and Silver 2016) |
| Dataset Splits | No | The paper mentions following 'standard training and evaluation procedures (Mnih et al. 2015; van Hasselt, Guez, and Silver 2016)' but does not explicitly state specific training/validation/test dataset splits, percentages, or counts within its text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Open AI Gym' and the 'Dopamine framework' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For fair comparison with QR-DQN, we used the same hyperparameters: N = 200, Adam optimizer (Kingma and Ba 2015) with lr = 0.00005, ϵADAM = 0.01/32. We used ϵ-greedy policy with ϵ being decayed at the same rate as in DQN but to a lower value ϵ = 0.01 as commonly used by the distributional RL methods. We used a target network to compute the distributional Bellman target as with DQN. |