reproducibilityindex.ai

Select to Perfect: Imitating desired behavior from large multi-agent data

Authors: Tim Franzmeyer, Edith Elkind, Philip Torr, Jakob Nicolaus Foerster, Joao F. Henriques

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate how EVs can be estimated from fully-anonymized data and employ EV2BC (Def. 4.5) to learn policies aligned with the DVF, outperforming relevant baselines. The project website can be found at https://tinyurl.com/select-to-perfect. We run all experiments for five random seeds and report mean and standard deviation where applicable. For more details on the implementation, please refer to the Appendix. In the following experiments, we first evaluate EVs as a measure of an agent s contribution to a given DVF. We then assess the average estimation error for EVs as the number of observations in the dataset D decreases and how applying clustering decreases this error. We lastly evaluate the performance of Exchange Value based Behaviour Cloning (EV2BC, see Definition 4.5) for simulated and human datasets and compare to relevant baselines, such as standard Behavior Cloning (Pomerleau, 1991) and Offline Reinforcement Learning (Pan et al., 2022).
Researcher Affiliation	Academia	Tim Franzmeyer Edith Elkind Philip Torr Jakob Foerster Jo ao F. Henriques University of Oxford frtim@robots.ox.ac.uk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Reproducibility. To help reproduce our work, we publish code on the project website at https://tinyurl.com/select-to-perfect.
Open Datasets	Yes	The Dhuman dataset was collected from humans playing the game (see Carroll et al. (2019)); it is fully anonymized with one-time-use agent identifiers, hence is a degenerate dataset (see Figure 2 bottom row). The Star Craft Multi-Agent Challenge (Samvelyan et al., 2019) is a cooperative multi-agent environment...
Dataset Splits	No	The paper refers to “dataset D” and “test set” in the context of evaluation, but does not explicitly provide details about train/validation/test dataset splits with percentages or counts for reproduction.
Hardware Specification	Yes	We used an Intel(R) Xeon(R) Silver 4116 CPU and an NVIDIA Ge Force GTX 1080 Ti (only for training BC, EV2BC, group-BC, and OMAR policies).
Software Dependencies	No	The paper mentions various software components and algorithms used (e.g., k-means, PCA, SLSQP, scikit-learn) but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	In accordance with the quantity of available data, we set the threshold parameter such that only agents with EVs above the 90th, 67th, and 50th percentile are imitated in To C, Starcraft, and Overcooked, respectively. We conducted a hyperparameter sweep for the following parameters: learning rate with options {0.01, 0.001, 0.0001}, Omar-coe with options {0.1, 1, 10}, Omar-iters with options {1, 3, 10}, and Omar-sigma with options {1, 2, 3}.