Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy
Authors: Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, our goal is to answer the following questions: 1) Mastery of Non-Press Diplomacy: Can our agent master the non-press diplomacy against baselines? 2) Competing with State-of-the-Art: Can our agent surpass the performance of the current state-of-the-art agents in press diplomacy? 3) Compatibility with LLMs: Can our self-evolving framework be compatible with different LLMs? 4) Contribution of Each Module: Do the individual modules within our framework contribute to the overall improvement of our agent s performance? We evaluate the models based on the results of multiple rounds of games. In each round, the model is randomly assigned a country to control. Typically, 1000 rounds are played to obtain the average results. We evaluate the models in two metrics. One is based on the win rate, Most SC rate, survived rate, and defeated rate. |
| Researcher Affiliation | Academia | Institute for Artificial Intelligence, Peking University College of Computer Science, Beijing Information Science and Technology University School of Artificial Intelligence, Beijing Normal University Center on Frontiers of Computing Studies, School of Computer Science, Nat l Eng. Research Center of Visual Technology, Peking University State Key Laboratory of General Artificial Intelligence, BIGAI BCorresponding authors: xykong@bistu.edu.cn, fangweizhong@bnu.edu.cn |
| Pseudocode | No | The paper describes the system components and their interactions verbally and through diagrams, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Project page: https://sites.google.com/view/richelieu-diplomacy. (The project page states “Code is not available yet” as of May 2024, indicating no concrete access to the code for the described methodology.) |
| Open Datasets | Yes | The widely-used open source Diplomacy game platform introduced by [Paquette et al., 2019] is adopted for evaluating Richelieu against other models. It is easy to switch between no-press (with negotiation between players) and press (no negotiation between players) games based on this platform, facilitating a comparison of both settings. The platform also contains over 10,000 human game data on which previous approaches are used. |
| Dataset Splits | No | The paper describes the experimental setup and evaluation protocol involving self-play and competition rounds, but it does not specify traditional training, validation, or test dataset splits in terms of fixed percentages or sample counts for a static dataset, as their method uses self-generated data. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of LLMs like GPT-4 and Llama 3 and the Diplomacy game platform by Paquette et al., 2019, but it does not specify version numbers for any software libraries, programming languages, or frameworks used in the implementation. |
| Experiment Setup | Yes | In experiments, we set a temperature of 0.3 to ensure a relatively stable generation of LLM policies. Specifically, we randomly assign three countries to one model and the remaining four to another. |