COLA: Consistent Learning with Opponent-Learning Awareness
Authors: Timon Willi, Alistair Hp Letcher, Johannes Treutlein, Jakob Foerster
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, in Sections 5 and 6, we report our experimental setup and results, investigating COLA and HOLA and comparing COLA to LOLA and CGD in a range of games. |
| Researcher Affiliation | Academia | 1Department of Engineering Science, University of Oxford, United Kingdom 2Department of Computer Science, University of Toronto, Canada 3Vector Institute, Toronto, Canada. |
| Pseudocode | No | The paper describes methods in text but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to open-source code for the described methodology. |
| Open Datasets | No | The paper performs experiments on game environments described by loss functions, rather than using traditional publicly available datasets with specified access information. |
| Dataset Splits | No | The paper does not specify training, validation, or test dataset splits in the context of traditional datasets, as it conducts experiments in game environments. |
| Hardware Specification | No | A part of this work was done while Timon Willi and Jakob Foerster were at the Vector Institute, University of Toronto. They are grateful for the access to the Vector Institute s compute infrastructure. They are also grateful for the access to the Advanced Research Computing (ARC) infrastructure. |
| Software Dependencies | No | All code was implemented using Python. The code relies on the PyTorch library for autodifferentiability (Paszke et al., 2019). The optimizer used is Adam (Kingma & Ba, 2015). |
| Experiment Setup | Yes | For the polynomial games, COLA uses a neural network with 1 non-linear layer for both h1(θ1, θ2) and h2(θ1, θ2). The non-linearity is a Re LU function. The layer has 8 nodes. For training, we randomly sample pairs of parameters on a [-1, 1] parameter region. ... We use a batch size of 8. We found that training is improved with a learning rate scheduler. For the learning rate scheduling we use a γ of 0.9. We train the neural network for 120,000 steps. ... For the non-polynomial games, we deploy a neural network with 3 non-linear layers using Tanh activation functions. Each layer has 16 nodes. For this type of game, the parameter region is set to [-7, 7]... During training, we used a batch size of 64. |