Diverse Conventions for Human-AI Collaboration

Authors: Bidipta Sarkar, Andy Shih, Dorsa Sadigh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users. We evaluate our method on three environments: Blind Bandits, Balance Beam, and Overcooked [5, 34, 26]. Finally, we evaluate our generated pool of conventions in a user study by training an agent that is aware of the conventions in the pool and testing it with human users in Overcooked.
Researcher Affiliation Academia Bidipta Sarkar Stanford University bidiptas@stanford.edu Andy Shih Stanford University andyshih@cs.stanford.edu Dorsa Sadigh Stanford University dorsa@cs.stanford.edu
Pseudocode Yes Pseudocode and specific implementation details for the algorithm incorporating this loss, including the mixed-play buffer generation, is presented in Appendix A. Algorithm 1: Generating Mixed Play Buffer. Algorithm 2: Diverse Conventions with Co Me Di.
Open Source Code Yes Supplemental videos can be found on our website along with source code and anonymized user study data.
Open Datasets Yes Supplemental videos can be found on our website along with source code and anonymized user study data.
Dataset Splits No The paper describes training and testing phases but does not explicitly provide details about training/validation/test dataset splits, such as percentages, sample counts, or predefined split references for any of the environments or the user study.
Hardware Specification Yes We only used the Intel Xeon Silver 4214R CPU for training in Blind Bandits and Balance Beam. For the Overcooked experiments, we used an additional NVIDIA TITAN RTX, which required around 3 hours per configuration.
Software Dependencies No The paper mentions key software components like 'Multi-Agent PPO algorithm (MAPPO)', 'Pantheon RL library', 'GPU-accelerated simulation framework', and 'tensorflow-js', but it does not provide specific version numbers for any of these.
Experiment Setup Yes Table 3: Common hyperparameters for agents in Blind Bandits and Balance Beam. Table 4: Hyperparameters in Blind Bandits. Table 5: Hyperparameters in Balance Beam. Table 6: Hyperparameters in Overcooked (only training set). Table 7: Hyperparameters in Overcooked (convention-aware agents).