reproducibilityindex.ai

On the Critical Role of Conventions in Adaptive Human-AI Collaboration

Authors: Andy Shih, Arjun Sawhney, Jovana Kondic, Stefano Ermon, Dorsa Sadigh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose a learning framework that teases apart rule-dependent representation from convention-dependent representation in a principled way. We show that, under some assumptions, our rule-dependent representation is a sufﬁcient statistic of the distribution over best-response strategies across partners. Using this separation of representations, our agents are able to adapt quickly to new partners, and to coordinate with old partners on new tasks in a zero-shot manner. We experimentally validate our approach on three collaborative tasks varying in complexity: a contextual multi-armed bandit, a block placing task, and the card game Hanabi.
Researcher Affiliation	Academia	Andy Shih , Arjun Sawhney , Jovana Kondic , Stefano Ermon & Dorsa Sadigh Stanford University, Princeton University {andyshih,arjunsawhney,ermon,dorsa}@cs.stanford.edu jkondic@princeton.edu
Pseudocode	Yes	Algorithm 1: Learning Separate Representations for Partners and Tasks
Open Source Code	Yes	Code for the experiments in our paper is available at https://github.com/ Stanford-ILIAD/Conventions-Modular Policy.
Open Datasets	Yes	Task setup We used the Hanabi Learning Environment package (Bard et al., 2020), with the following conﬁguration: 1 color, 5 ranks, 2 players, hand size 2, 3 information tokens, and 3 life tokens. The maximum score is 5 points.
Dataset Splits	No	The paper discusses training and adapting to partners and tasks, but it does not explicitly provide specific dataset splits (e.g., percentages or sample counts for training, validation, and test sets) for reproducibility of data partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU models, GPU types, or memory configurations.
Software Dependencies	No	The paper mentions using 'Proximal Policy Optimization (PPO)' and the 'Stable Baselines software package (Rafﬁn et al., 2019)' but does not specify version numbers for these software libraries or other key dependencies beyond the publication year of the Stable Baselines paper.
Experiment Setup	Yes	Appendix B ARCHITECTURE DETAILS AND HYPERPARAMETERS contains tables that list specific hyperparameters for training and adapting models. For example, Table 2 lists 'Timesteps', 'Minibatch size', 'Num. epochs', and 'Learning Rate' for training self-play partners, and Table 1 details layer sizes for the modules.