Evaluating Language Model Agency Through Negotiations
Authors: Tim Ruben Davidson, Veniamin Veselovsky, Michal Kosinski, Robert West
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use our approach to test six widely used and publicly accessible LMs, evaluating performance and alignment in both self-play and cross-play settings. |
| Researcher Affiliation | Academia | 1EPFL, 2Stanford University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | We release our framework as an open-source library allowing other scholars and the OSS community to conveniently replicate and extend our findings. Our code and link to generated data are made available here: https://github.com/epfl-dlab/LAMEN. |
| Open Datasets | Yes | We release an open-source library and all data generated during this project (LAMEN transcripts) allowing other scholars and the OSS community to conveniently replicate and extend our findings. Our code and link to generated data are made available here: https://github.com/epfl-dlab/LAMEN. |
| Dataset Splits | No | The paper describes experiments evaluating pre-existing language models and does not mention specific training, validation, or test dataset splits for model training or evaluation in the traditional sense. It generates data during the negotiation process. |
| Hardware Specification | No | The paper mentions 'compute support from the Microsoft Accelerate Foundation Model Academic Research program' and notes that models were accessed via 'paid APIs', but does not provide specific details on GPU models, CPU types, or other hardware specifications used for their experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Table 6: Default parameters used for self-play and cross-play experiments across all models. Parameter: Note max words, Value: 64. Parameter: Message max words, Value: 64. |