Opponent Modeling with In-context Search

Authors: Yuheng Jing, Bingyun Liu, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, in competitive, cooperative, and mixed environments, OMIS demonstrates more effective and stable adaptation to opponents than other approaches. See our project website at https://sites.google.com/view/nips2024-omis. 5 Experiments
Researcher Affiliation Collaboration Yuheng Jing1,2 Bingyun Liu1,2 Kai Li1,2, Yifan Zang1,2 Haobo Fu6 Qiang Fu6 Junliang Xing5 Jian Cheng1,3,4, 1 Institute of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Sciences ... 5 Tsinghua University 6 Tencent AI Lab
Pseudocode Yes B Pseudocode of OMIS Algorithm 1 Opponent Modeling with In-context Search (OMIS)
Open Source Code Yes See our project website at https://sites.google.com/view/nips2024-omis.
Open Datasets No To generate training data for pretraining the three components, we continually sample opponent policies from Πtrain and use their corresponding BR to play against them. For each episode, we sample a π 1,k from Πtrain as opponents and use its BR π1,k, as self-agent to play against it.
Dataset Splits No During the pretraining stage, opponent policies are sampled from a training set of opponent policies Πtrain := {π 1,k}K k=1. During the testing stage, opponent policies are sampled from a testing set of opponent policies Πtest, which includes an unknown number of unknown opponent policies.
Hardware Specification Yes CPU: AMD EPYC 7742 64-Core Processor 2 GPU: NVIDIA Ge Force RTX 3090 24G 8 MEM: 500G
Software Dependencies No The backbone of the OMIS architecture is mainly implemented based on the causal Transformer, i.e., GPT2 [67] model of Hugging Face [93].
Experiment Setup Yes H Hyperparameters H.2 Hyperparameters for In-Context-Learning-based Pretraining