Select to Perfect: Imitating desired behavior from large multi-agent data
Authors: Tim Franzmeyer, Edith Elkind, Philip Torr, Jakob Nicolaus Foerster, Joao F. Henriques
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate how EVs can be estimated from fully-anonymized data and employ EV2BC (Def. 4.5) to learn policies aligned with the DVF, outperforming relevant baselines. The project website can be found at https://tinyurl.com/select-to-perfect. We run all experiments for five random seeds and report mean and standard deviation where applicable. For more details on the implementation, please refer to the Appendix. In the following experiments, we first evaluate EVs as a measure of an agent s contribution to a given DVF. We then assess the average estimation error for EVs as the number of observations in the dataset D decreases and how applying clustering decreases this error. We lastly evaluate the performance of Exchange Value based Behaviour Cloning (EV2BC, see Definition 4.5) for simulated and human datasets and compare to relevant baselines, such as standard Behavior Cloning (Pomerleau, 1991) and Offline Reinforcement Learning (Pan et al., 2022). |
| Researcher Affiliation | Academia | Tim Franzmeyer Edith Elkind Philip Torr Jakob Foerster Jo ao F. Henriques University of Oxford frtim@robots.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Reproducibility. To help reproduce our work, we publish code on the project website at https://tinyurl.com/select-to-perfect. |
| Open Datasets | Yes | The Dhuman dataset was collected from humans playing the game (see Carroll et al. (2019)); it is fully anonymized with one-time-use agent identifiers, hence is a degenerate dataset (see Figure 2 bottom row). The Star Craft Multi-Agent Challenge (Samvelyan et al., 2019) is a cooperative multi-agent environment... |
| Dataset Splits | No | The paper refers to “dataset D” and “test set” in the context of evaluation, but does not explicitly provide details about train/validation/test dataset splits with percentages or counts for reproduction. |
| Hardware Specification | Yes | We used an Intel(R) Xeon(R) Silver 4116 CPU and an NVIDIA Ge Force GTX 1080 Ti (only for training BC, EV2BC, group-BC, and OMAR policies). |
| Software Dependencies | No | The paper mentions various software components and algorithms used (e.g., k-means, PCA, SLSQP, scikit-learn) but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | In accordance with the quantity of available data, we set the threshold parameter such that only agents with EVs above the 90th, 67th, and 50th percentile are imitated in To C, Starcraft, and Overcooked, respectively. We conducted a hyperparameter sweep for the following parameters: learning rate with options {0.01, 0.001, 0.0001}, Omar-coe with options {0.1, 1, 10}, Omar-iters with options {1, 3, 10}, and Omar-sigma with options {1, 2, 3}. |