reproducibilityindex.ai

LOQA: Learning with Opponent Q-Learning Awareness

Authors: Milad Aghajohari, Juan Agustin Duque, Tim Cooijmans, Aaron Courville

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner s Dilemma and the Coin Game.
Researcher Affiliation	Academia	Milad Aghajohari , Juan Agustin Duque , Tim Cooijmans, Aaron Courville University of Montreal & Mila firstname.lastname@umontreal.ca
Pseudocode	Yes	Algorithm 1 LOQA and Algorithm 2 LOQA ACTOR LOSS provide structured pseudocode.
Open Source Code	Yes	For reproducing our results on the IPD and the Coin Game please visit this link. This is an anonymized repository and the instructions for reproducing the results and the seeds are provided.
Open Datasets	Yes	We consider two general-sum environments to evaluate LOQA against the current state-of-the-art, namely, the Iterated Prisoner s Dilemma (IPD) and the Coin Game. Initially described in (Lerer & Peysakhovich, 2018).
Dataset Splits	No	The paper does not provide explicit training/validation/test dataset splits (e.g., percentages or sample counts). The environments used (IPD, Coin Game) are described, but not in terms of how fixed datasets for these were split for validation purposes.
Hardware Specification	Yes	More importantly our agents are fully trained after only 2 hours of compute time in an Nvidia A100 gpu, compared to the 8 hours of training it takes POLA to achieve the results shown in Figure 2. We run our Coin Game experiments for 15 minutes on a single A100 GPU with 80 Gigabytes of GPU memory and 20 Gigabytes of CPU memory.
Software Dependencies	No	The paper mentions the 'JAX ecosystem Bradbury et al. (2018)' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	The training is done for 4500 iterations (approximately 15 minutes on an Nvidia A100 gpu) using a batch size of 2048. The hyperparameters of our training are indicated in the Table 3.