reproducibilityindex.ai

A Natural Actor-Critic Framework for Zero-Sum Markov Games

Authors: Ahmet Alacaoglu, Luca Viano, Niao He, Volkan Cevher

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games.
Researcher Affiliation	Academia	Ahmet Alacaoglu 1 Luca Viano 2 Niao He 3 Volkan Cevher 2 1University of Wisconsin-Madison 2EPFL 3ETH-Z urich.
Pseudocode	Yes	Algorithm 1 Reflected NAC with a game etiquette and ζ-greedy exploration; Algorithm 2 VN = Policy-Eval-V (x, y, N, β); Algorithm 3 ˆθN = Policy-Eval-θ(x, y, N, ˆV , β) for player y; Algorithm 4 ˆνN = Policy-Eval-ν(x, y, N, β) for player y.
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	In particular, we consider two domains: a two players bandits environment with 100 arms and a board game known as Alesia ((Perolat et al., 2015)).
Dataset Splits	No	The paper evaluates its algorithms on reinforcement learning environments and reports average results over multiple seeds ('averaged over 10 seeds' for bandits, 'averaged over 5 seeds' for Alesia), but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification	No	The paper mentions that 'Part of the work was done while A. Alacaoglu was at EPFL' and includes funding acknowledgements, but it does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper provides tables of hyperparameters for its algorithms and baseline methods (e.g., Table 2, Table 3), but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x).
Experiment Setup	Yes	I.2. Hyperparameters selection. In the following we report the hyperparameters chosen for the experiments to ensure reproducibility. Table 2. Hyperparametets for Reflected NAC in the two players bandit environment (e.g., η 0.023, K 100, N 10, T 10).