LOQA: Learning with Opponent Q-Learning Awareness
Authors: Milad Aghajohari, Juan Agustin Duque, Tim Cooijmans, Aaron Courville
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner s Dilemma and the Coin Game. |
| Researcher Affiliation | Academia | Milad Aghajohari , Juan Agustin Duque , Tim Cooijmans, Aaron Courville University of Montreal & Mila firstname.lastname@umontreal.ca |
| Pseudocode | Yes | Algorithm 1 LOQA and Algorithm 2 LOQA ACTOR LOSS provide structured pseudocode. |
| Open Source Code | Yes | For reproducing our results on the IPD and the Coin Game please visit this link. This is an anonymized repository and the instructions for reproducing the results and the seeds are provided. |
| Open Datasets | Yes | We consider two general-sum environments to evaluate LOQA against the current state-of-the-art, namely, the Iterated Prisoner s Dilemma (IPD) and the Coin Game. Initially described in (Lerer & Peysakhovich, 2018). |
| Dataset Splits | No | The paper does not provide explicit training/validation/test dataset splits (e.g., percentages or sample counts). The environments used (IPD, Coin Game) are described, but not in terms of how fixed datasets for these were split for validation purposes. |
| Hardware Specification | Yes | More importantly our agents are fully trained after only 2 hours of compute time in an Nvidia A100 gpu, compared to the 8 hours of training it takes POLA to achieve the results shown in Figure 2. We run our Coin Game experiments for 15 minutes on a single A100 GPU with 80 Gigabytes of GPU memory and 20 Gigabytes of CPU memory. |
| Software Dependencies | No | The paper mentions the 'JAX ecosystem Bradbury et al. (2018)' but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The training is done for 4500 iterations (approximately 15 minutes on an Nvidia A100 gpu) using a batch size of 2048. The hyperparameters of our training are indicated in the Table 3. |