Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Diverse Conventions for Human-AI Collaboration
Authors: Bidipta Sarkar, Andy Shih, Dorsa Sadigh
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users. We evaluate our method on three environments: Blind Bandits, Balance Beam, and Overcooked [5, 34, 26]. Finally, we evaluate our generated pool of conventions in a user study by training an agent that is aware of the conventions in the pool and testing it with human users in Overcooked. |
| Researcher Affiliation | Academia | Bidipta Sarkar Stanford University EMAIL Andy Shih Stanford University EMAIL Dorsa Sadigh Stanford University EMAIL |
| Pseudocode | Yes | Pseudocode and specific implementation details for the algorithm incorporating this loss, including the mixed-play buffer generation, is presented in Appendix A. Algorithm 1: Generating Mixed Play Buffer. Algorithm 2: Diverse Conventions with Co Me Di. |
| Open Source Code | Yes | Supplemental videos can be found on our website along with source code and anonymized user study data. |
| Open Datasets | Yes | Supplemental videos can be found on our website along with source code and anonymized user study data. |
| Dataset Splits | No | The paper describes training and testing phases but does not explicitly provide details about training/validation/test dataset splits, such as percentages, sample counts, or predefined split references for any of the environments or the user study. |
| Hardware Specification | Yes | We only used the Intel Xeon Silver 4214R CPU for training in Blind Bandits and Balance Beam. For the Overcooked experiments, we used an additional NVIDIA TITAN RTX, which required around 3 hours per configuration. |
| Software Dependencies | No | The paper mentions key software components like 'Multi-Agent PPO algorithm (MAPPO)', 'Pantheon RL library', 'GPU-accelerated simulation framework', and 'tensorflow-js', but it does not provide specific version numbers for any of these. |
| Experiment Setup | Yes | Table 3: Common hyperparameters for agents in Blind Bandits and Balance Beam. Table 4: Hyperparameters in Blind Bandits. Table 5: Hyperparameters in Balance Beam. Table 6: Hyperparameters in Overcooked (only training set). Table 7: Hyperparameters in Overcooked (convention-aware agents). |