Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Opponent Modeling with In-context Search
Authors: Yuheng Jing, Bingyun Liu, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, in competitive, cooperative, and mixed environments, OMIS demonstrates more effective and stable adaptation to opponents than other approaches. See our project website at https://sites.google.com/view/nips2024-omis. 5 Experiments |
| Researcher Affiliation | Collaboration | Yuheng Jing1,2 Bingyun Liu1,2 Kai Li1,2, Yifan Zang1,2 Haobo Fu6 Qiang Fu6 Junliang Xing5 Jian Cheng1,3,4, 1 Institute of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Sciences ... 5 Tsinghua University 6 Tencent AI Lab |
| Pseudocode | Yes | B Pseudocode of OMIS Algorithm 1 Opponent Modeling with In-context Search (OMIS) |
| Open Source Code | Yes | See our project website at https://sites.google.com/view/nips2024-omis. |
| Open Datasets | No | To generate training data for pretraining the three components, we continually sample opponent policies from Πtrain and use their corresponding BR to play against them. For each episode, we sample a π 1,k from Πtrain as opponents and use its BR π1,k, as self-agent to play against it. |
| Dataset Splits | No | During the pretraining stage, opponent policies are sampled from a training set of opponent policies Πtrain := {π 1,k}K k=1. During the testing stage, opponent policies are sampled from a testing set of opponent policies Πtest, which includes an unknown number of unknown opponent policies. |
| Hardware Specification | Yes | CPU: AMD EPYC 7742 64-Core Processor 2 GPU: NVIDIA Ge Force RTX 3090 24G 8 MEM: 500G |
| Software Dependencies | No | The backbone of the OMIS architecture is mainly implemented based on the causal Transformer, i.e., GPT2 [67] model of Hugging Face [93]. |
| Experiment Setup | Yes | H Hyperparameters H.2 Hyperparameters for In-Context-Learning-based Pretraining |