Newton Optimization on Helmholtz Decomposition for Continuous Games
Authors: Giorgia Ramponi, Marcello Restelli11325-11333
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically compare the NOHD s performance with state-of-the-art algorithms on some bimatrix games and in a continuous Gridworld environment. Finally, in Section 6, we analyze the empirical performance of NOHD when agents optimize a Boltzmann policy in three bimatrix games: Prisoner s Dilemma, Matching Pennies, and Rock-Paper-Scissors. In the last experiment, we study the learning performance of NOHD in two continuous gridworld environments. In all experiments, NOHD achieves great results confirming the quadratic nature of the update. |
| Researcher Affiliation | Academia | Giorgia Ramponi, Marcello Restelli Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Piazza Leonardo da Vinci, 32, 20133, Milano, Italy {giorgia.ramponi, marcello.restelli}@polimi.it |
| Pseudocode | Yes | Algorithm 1 NOHD |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about the release of its source code for the described methodology. |
| Open Datasets | Yes | The first gridworld is the continuous version of the second gridworld proposed in (Hu and Wellman 2003): the two agents are initialized in the two opposite lower corners and have to reach the same goal; when one of the two agents reaches the goal, the game ends, and this agent gets a positive reward. |
| Dataset Splits | No | The paper mentions experimental settings like initializations, number of runs, and sampling trajectories, but does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or test sets. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | For each game, we perform experiments with learning rates 0.1, 0.5, 1.0. In Matching Pennies we initialize probabilities to [0.86, 0.14] for the first agent and to [0.14, 0.86] for the second agent; instead in Rock Paper Scissors to [0.66, 0.24, 0.1]. We performed 20 runs for each setting. In each iteration, we sampled 300 trajectories of length 1. The agents policies are Gaussian policies, linear in a set of respectively 72 and 68 radial basis functions, which generate the ν angle for the step s direction. |