Opponent Modeling with In-context Search
Authors: Yuheng Jing, Bingyun Liu, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, in competitive, cooperative, and mixed environments, OMIS demonstrates more effective and stable adaptation to opponents than other approaches. See our project website at https://sites.google.com/view/nips2024-omis. 5 Experiments |
| Researcher Affiliation | Collaboration | Yuheng Jing1,2 Bingyun Liu1,2 Kai Li1,2, Yifan Zang1,2 Haobo Fu6 Qiang Fu6 Junliang Xing5 Jian Cheng1,3,4, 1 Institute of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Sciences ... 5 Tsinghua University 6 Tencent AI Lab |
| Pseudocode | Yes | B Pseudocode of OMIS Algorithm 1 Opponent Modeling with In-context Search (OMIS) |
| Open Source Code | Yes | See our project website at https://sites.google.com/view/nips2024-omis. |
| Open Datasets | No | To generate training data for pretraining the three components, we continually sample opponent policies from Πtrain and use their corresponding BR to play against them. For each episode, we sample a π 1,k from Πtrain as opponents and use its BR π1,k, as self-agent to play against it. |
| Dataset Splits | No | During the pretraining stage, opponent policies are sampled from a training set of opponent policies Πtrain := {π 1,k}K k=1. During the testing stage, opponent policies are sampled from a testing set of opponent policies Πtest, which includes an unknown number of unknown opponent policies. |
| Hardware Specification | Yes | CPU: AMD EPYC 7742 64-Core Processor 2 GPU: NVIDIA Ge Force RTX 3090 24G 8 MEM: 500G |
| Software Dependencies | No | The backbone of the OMIS architecture is mainly implemented based on the causal Transformer, i.e., GPT2 [67] model of Hugging Face [93]. |
| Experiment Setup | Yes | H Hyperparameters H.2 Hyperparameters for In-Context-Learning-based Pretraining |