reproducibilityindex.ai

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Authors: Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Remi Munos, Bernardo Avila Pires, Michal Valko

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When combined with the IMPALA architecture, Do Mo-AC has showed improvements over the baseline algorithm on Atari-57 game benchmarks.
Researcher Affiliation	Industry	1Google DeepMind 2Omron Sinic X. Correspondence to: Yunhao Tang <robintyh@deepmind.com>.
Pseudocode	Yes	Algorithm 1 Doubly multi-step off-policy actor-critic (Do Mo-AC)
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	All evaluation environments are the entire suite of Atari games (Bellemare et al., 2013) consisting of 57 levels.
Dataset Splits	No	The paper does not provide specific details on dataset splits (e.g., percentages, sample counts) for training, validation, or testing.
Hardware Specification	No	The paper mentions 'a central GPU learner and N = 512 distributed CPU actors' but does not provide specific models or specifications for the GPU or CPU hardware used.
Software Dependencies	No	The paper mentions using 'RMSProp optimizers (Tieleman et al., 2012)' but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	The policy/value function networks are both trained by RMSProp optimizers (Tieleman et al., 2012) with learning rate α = 5 10^-4 and no momentum. To encourage exploration, the policy loss is augmented by an entropy regularization term with coefficient ce = 0.01 and baseline loss with coefficient cv = 0.5, i.e. the full loss L = Lpolicy + cv Lvalue + ce Lentropy.