How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
Authors: Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, James Zou
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we study how well LLMs can negotiate with each other. We develop NEGOTIATIONARENA: a flexible framework for evaluating and probing the negotiation abilities of LLM agents. We implemented three types of scenarios in NEGOTIATIONARENA to assess LLM s behaviors in allocating shared resources (ultimatum games), aggregate resources (trading games) and buy/sell goods (price negotiations). |
| Researcher Affiliation | Collaboration | 1Stanford University, Stanford, California 2Independent 3Bauplan, New York, New York. |
| Pseudocode | No | The paper describes the system implementation and communication format using XML-like tags, but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our contributions: We propose NEGOTIATIONARENA: an open-source framework to evaluate and probe the negotiation abilities of LLM agents. 1NEGOTIATIONARENA is available at https://github. com/vinid/Negotiation Arena. |
| Open Datasets | No | The paper defines and implements custom negotiation scenarios rather than using a pre-existing publicly available dataset. It describes how interactions are generated within its framework for evaluation. |
| Dataset Splits | No | The paper specifies the number of negotiations run per pair of agents ("We run 60 negotiations for each ordered pair of agents in each scenario."), but it does not provide specific details on traditional dataset splits (e.g., train/validation/test percentages or sample counts) for a dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions that NEGOTIATIONARENA is "implemented in Python," but it does not provide specific version numbers for Python or any other libraries or frameworks used. |
| Experiment Setup | Yes | We run 60 negotiations for each ordered pair of agents in each scenario. Both GPT and Claude are using a temperature of 0.7 and they can generate a response of a maximum of 400 tokens. We add behavioral prompts to the system prompt of each game. |