COLA: Consistent Learning with Opponent-Learning Awareness

Authors: Timon Willi, Alistair Hp Letcher, Johannes Treutlein, Jakob Foerster

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, in Sections 5 and 6, we report our experimental setup and results, investigating COLA and HOLA and comparing COLA to LOLA and CGD in a range of games.
Researcher Affiliation Academia 1Department of Engineering Science, University of Oxford, United Kingdom 2Department of Computer Science, University of Toronto, Canada 3Vector Institute, Toronto, Canada.
Pseudocode No The paper describes methods in text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement or link providing access to open-source code for the described methodology.
Open Datasets No The paper performs experiments on game environments described by loss functions, rather than using traditional publicly available datasets with specified access information.
Dataset Splits No The paper does not specify training, validation, or test dataset splits in the context of traditional datasets, as it conducts experiments in game environments.
Hardware Specification No A part of this work was done while Timon Willi and Jakob Foerster were at the Vector Institute, University of Toronto. They are grateful for the access to the Vector Institute s compute infrastructure. They are also grateful for the access to the Advanced Research Computing (ARC) infrastructure.
Software Dependencies No All code was implemented using Python. The code relies on the PyTorch library for autodifferentiability (Paszke et al., 2019). The optimizer used is Adam (Kingma & Ba, 2015).
Experiment Setup Yes For the polynomial games, COLA uses a neural network with 1 non-linear layer for both h1(θ1, θ2) and h2(θ1, θ2). The non-linearity is a Re LU function. The layer has 8 nodes. For training, we randomly sample pairs of parameters on a [-1, 1] parameter region. ... We use a batch size of 8. We found that training is improved with a learning rate scheduler. For the learning rate scheduling we use a γ of 0.9. We train the neural network for 120,000 steps. ... For the non-polynomial games, we deploy a neural network with 3 non-linear layers using Tanh activation functions. Each layer has 16 nodes. For this type of game, the parameter region is set to [-7, 7]... During training, we used a batch size of 64.