Towards Offline Opponent Modeling with In-context Learning
Authors: Yuheng Jing, Kai Li, Bingyun Liu, Yifan Zang, Haobo Fu, QIANG FU, Junliang Xing, Jian Cheng
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and ablation studies on competitive environments with sparse and dense rewards demonstrate the impressive performance of TAO. Our approach manifests remarkable prowess for fast adaptation, especially in the face of unseen opponent policies, confirming its in-context learning potency. |
| Researcher Affiliation | Collaboration | Yuheng Jing1,2 Kai Li1,2, Bingyun Liu1,2 Yifan Zang1,2 Haobo Fu6 Qiang Fu6 Junliang Xing5 Jian Cheng1,3,4, denotes corresponding authors 1 Institute of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Sciences 3 School of Future Technology, University of Chinese Academy of Sciences 4 Ai Ri A 5 Tsinghua University 6 Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1 Transformer Against Opponent (TAO) |
| Open Source Code | No | The paper provides links to third-party open-source code used for environments and baselines (e.g., Open Spiel, Multi-Agent Particle Environment, Prompt-DT) but does not provide a link or explicit statement for the open-source code of TAO itself. |
| Open Datasets | Yes | We consider two emblematic competitive environmental benchmarks: 1) Markov Soccer (MS) (Lanctot et al., 2019)... For specific implementation of MS, we adopt the open-source code of Open Spiel, which is available at https://github.com/deepmind/open_spiel. 2) Particleworld Adversary (PA) (Lowe et al., 2017)... For specific implementation of PA, we adopt the open-source code of Multi-Agent Particle Environment, which is available at https://github.com/openai/ multiagent-particle-envs. |
| Dataset Splits | No | The paper describes the construction of the offline dataset (Doff) and the different configurations of opponent policies (Πoff, Πtest with seen, unseen, mix) used for training and testing. However, it does not explicitly specify traditional training, validation, and test dataset splits (e.g., percentages or counts) for a single dataset, nor does it mention a dedicated validation set for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using GPT2 (Radford et al., 2019) from Hugging Face (Wolf et al., 2020), AdamW (Loshchilov & Hutter, 2018) optimizer, and PPO (Schulman et al., 2017) algorithm. However, it does not specify version numbers for these software dependencies or underlying libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | Section J 'HYPERPARAMETERS' provides detailed tables (J.1, J.2, J.3) for 'HYPERPARAMETERS FOR OFFLINE STAGE 1', 'HYPERPARAMETERS FOR OFFLINE STAGE 2', and 'HYPERPARAMETERS FOR DEPLOYMENT STAGE', including learning rates, batch sizes, number of training steps, and architectural details. |