A Natural Actor-Critic Framework for Zero-Sum Markov Games
Authors: Ahmet Alacaoglu, Luca Viano, Niao He, Volkan Cevher
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games. |
| Researcher Affiliation | Academia | Ahmet Alacaoglu 1 Luca Viano 2 Niao He 3 Volkan Cevher 2 1University of Wisconsin-Madison 2EPFL 3ETH-Z urich. |
| Pseudocode | Yes | Algorithm 1 Reflected NAC with a game etiquette and ζ-greedy exploration; Algorithm 2 VN = Policy-Eval-V (x, y, N, β); Algorithm 3 ˆθN = Policy-Eval-θ(x, y, N, ˆV , β) for player y; Algorithm 4 ˆνN = Policy-Eval-ν(x, y, N, β) for player y. |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | In particular, we consider two domains: a two players bandits environment with 100 arms and a board game known as Alesia ((Perolat et al., 2015)). |
| Dataset Splits | No | The paper evaluates its algorithms on reinforcement learning environments and reports average results over multiple seeds ('averaged over 10 seeds' for bandits, 'averaged over 5 seeds' for Alesia), but it does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper mentions that 'Part of the work was done while A. Alacaoglu was at EPFL' and includes funding acknowledgements, but it does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper provides tables of hyperparameters for its algorithms and baseline methods (e.g., Table 2, Table 3), but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x). |
| Experiment Setup | Yes | I.2. Hyperparameters selection. In the following we report the hyperparameters chosen for the experiments to ensure reproducibility. Table 2. Hyperparametets for Reflected NAC in the two players bandit environment (e.g., η 0.023, K 100, N 10, T 10). |