Newton Optimization on Helmholtz Decomposition for Continuous Games

Authors: Giorgia Ramponi, Marcello Restelli11325-11333

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically compare the NOHD s performance with state-of-the-art algorithms on some bimatrix games and in a continuous Gridworld environment. Finally, in Section 6, we analyze the empirical performance of NOHD when agents optimize a Boltzmann policy in three bimatrix games: Prisoner s Dilemma, Matching Pennies, and Rock-Paper-Scissors. In the last experiment, we study the learning performance of NOHD in two continuous gridworld environments. In all experiments, NOHD achieves great results confirming the quadratic nature of the update.
Researcher Affiliation Academia Giorgia Ramponi, Marcello Restelli Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Piazza Leonardo da Vinci, 32, 20133, Milano, Italy {giorgia.ramponi, marcello.restelli}@polimi.it
Pseudocode Yes Algorithm 1 NOHD
Open Source Code No The paper does not provide a specific link or explicit statement about the release of its source code for the described methodology.
Open Datasets Yes The first gridworld is the continuous version of the second gridworld proposed in (Hu and Wellman 2003): the two agents are initialized in the two opposite lower corners and have to reach the same goal; when one of the two agents reaches the goal, the game ends, and this agent gets a positive reward.
Dataset Splits No The paper mentions experimental settings like initializations, number of runs, and sampling trajectories, but does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or test sets.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running its experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments.
Experiment Setup Yes For each game, we perform experiments with learning rates 0.1, 0.5, 1.0. In Matching Pennies we initialize probabilities to [0.86, 0.14] for the first agent and to [0.14, 0.86] for the second agent; instead in Rock Paper Scissors to [0.66, 0.24, 0.1]. We performed 20 runs for each setting. In each iteration, we sampled 300 trajectories of length 1. The agents policies are Gaussian policies, linear in a set of respectively 72 and 68 radial basis functions, which generate the ν angle for the step s direction.