reproducibilityindex.ai

Proximal Learning With Opponent-Learning Awareness

Authors: Stephen Zhao, Chris Lu, Roger B. Grosse, Jakob Foerster

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then present practical approximations to the ideal POLA update, which we evaluate in several partially competitive environments with function approximation and opponent modeling. This empirically demonstrates that POLA achieves reciprocity-based cooperation more reliably than LOLA.
Researcher Affiliation	Collaboration	Stephen Zhao University of Toronto and Vector Institute stephen.zhao@mail.utoronto.ca Chris Lu FLAIR, University of Oxford christopher.lu@exeter.ox.ac.uk Roger Grosse University of Toronto and Vector Institute rgrosse@cs.toronto.edu Jakob Foerster FLAIR, University of Oxford jakob.foerster@eng.ox.ac.uk
Pseudocode	Yes	Algorithm 1 Outer POLA 2-agent formulation: update for agent 1 Algorithm 2 POLA-Di CE 2-agent formulation: update for agent 1
Open Source Code	Yes	For reproducibility, our code is available at: https://github.com/Silent-Zebra/POLA.
Open Datasets	No	The paper describes the experimental environments (IPD, coin game) and how agents interact within them, but does not provide concrete access information (link, DOI, formal citation) for specific datasets used for training, nor does it specify that these environments come with pre-defined public datasets. The experiments are conducted within simulated environments rather than on external, pre-existing datasets.
Dataset Splits	No	The paper describes running experiments and evaluating performance but does not explicitly state the use of specific training, validation, or test splits with percentages or counts for data. It refers to 'training' and 'evaluation' but not explicit data partitioning for these phases.
Hardware Specification	Yes	Appendix B.5. states: 'All experiments were run on an internal cluster of NVIDIA GeForce RTX 2080 Ti GPUs and Intel Xeon Gold 6248 CPUs, using up to 10 GPUs and 20 CPU cores per experiment. Training times varied widely based on algorithm and environment, from a few minutes to tens of hours for a single seed.'
Software Dependencies	No	The paper mentions using PyTorch in Appendix B.4. ('We adapted existing PyTorch implementations of the environments mentioned') but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	Appendix B.1.4 further discusses hyperparameter settings. For more details on the problem setting, policy parameterization, and hyperparameters, see Appendix B.2. Appendix B.3 provides more detail [on Coin Game setup]. The paper mentions specific settings like 'cooperation factor f R', 'learning rate η', 'penalty strength βout', 'number of outer steps M and inner steps K'.