reproducibilityindex.ai

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Authors: Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, our goal is to answer the following questions: 1) Mastery of Non-Press Diplomacy: Can our agent master the non-press diplomacy against baselines? 2) Competing with State-of-the-Art: Can our agent surpass the performance of the current state-of-the-art agents in press diplomacy? 3) Compatibility with LLMs: Can our self-evolving framework be compatible with different LLMs? 4) Contribution of Each Module: Do the individual modules within our framework contribute to the overall improvement of our agent s performance? We evaluate the models based on the results of multiple rounds of games. In each round, the model is randomly assigned a country to control. Typically, 1000 rounds are played to obtain the average results. We evaluate the models in two metrics. One is based on the win rate, Most SC rate, survived rate, and defeated rate.
Researcher Affiliation	Academia	Institute for Artificial Intelligence, Peking University College of Computer Science, Beijing Information Science and Technology University School of Artificial Intelligence, Beijing Normal University Center on Frontiers of Computing Studies, School of Computer Science, Nat l Eng. Research Center of Visual Technology, Peking University State Key Laboratory of General Artificial Intelligence, BIGAI BCorresponding authors: xykong@bistu.edu.cn, fangweizhong@bnu.edu.cn
Pseudocode	No	The paper describes the system components and their interactions verbally and through diagrams, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Project page: https://sites.google.com/view/richelieu-diplomacy. (The project page states “Code is not available yet” as of May 2024, indicating no concrete access to the code for the described methodology.)
Open Datasets	Yes	The widely-used open source Diplomacy game platform introduced by [Paquette et al., 2019] is adopted for evaluating Richelieu against other models. It is easy to switch between no-press (with negotiation between players) and press (no negotiation between players) games based on this platform, facilitating a comparison of both settings. The platform also contains over 10,000 human game data on which previous approaches are used.
Dataset Splits	No	The paper describes the experimental setup and evaluation protocol involving self-play and competition rounds, but it does not specify traditional training, validation, or test dataset splits in terms of fixed percentages or sample counts for a static dataset, as their method uses self-generated data.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions the use of LLMs like GPT-4 and Llama 3 and the Diplomacy game platform by Paquette et al., 2019, but it does not specify version numbers for any software libraries, programming languages, or frameworks used in the implementation.
Experiment Setup	Yes	In experiments, we set a temperature of 0.3 to ensure a relatively stable generation of LLM policies. Specifically, we randomly assign three countries to one model and the remaining four to another.