reproducibilityindex.ai

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

Authors: Chao Yu, Jiaxuan Gao, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate HSP on the Overcooked benchmark. Empirical results show that our HSP method produces higher rewards than baselines when cooperating with learned human models, manually scripted policies, and real humans.
Researcher Affiliation	Academia	1 Tsinghua University, 2 UC Berkeley, 3 Shanghai Qi Zhi Institute
Pseudocode	Yes	Algorithm 1: Greedy Policy Selection
Open Source Code	No	We would suggest visiting https://sites.google.com/view/hsp-iclr for more information. The paper does not explicitly state that source code for the methodology is provided, nor does it link directly to a source code repository.
Open Datasets	Yes	Overcooked Game: Overcooked (Carroll et al., 2019), which is a fully observable two-player cooperative game.
Dataset Splits	No	The paper does not specify explicit training/validation/test dataset splits with percentages or counts, as it operates within a reinforcement learning environment rather than a static dataset.
Hardware Specification	No	The paper mentions inference being performed on 'CPUs' and 'a GPU' but does not specify any particular hardware models, types, or quantities (e.g., NVIDIA A100, Intel Xeon).
Software Dependencies	No	The paper mentions using 'MAPPO' as the RL algorithm, but does not provide specific version numbers for any software dependencies like programming languages (e.g., Python 3.x) or deep learning frameworks (e.g., PyTorch 1.x).
Experiment Setup	Yes	Common hyperparameters for all methods in 5 layouts are listed in Table 8 and Table 9. Specifically, for MEP, we use the suggested hyperparameters from the original paper (Zhao et al., 2021). Detailed hyperparameters of MEP are shown in Table 10, where population entropy coef. adjusts the importance of the population entropy term. Detailed hyperparameters of Traj Div are shown in Table 11, where traj. gamma is the discounting factor used in local action kernel and diversity coef. adjusts the importance of the diversity term.