Diverse Conventions for Human-AI Collaboration
Authors: Bidipta Sarkar, Andy Shih, Dorsa Sadigh
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users. We evaluate our method on three environments: Blind Bandits, Balance Beam, and Overcooked [5, 34, 26]. Finally, we evaluate our generated pool of conventions in a user study by training an agent that is aware of the conventions in the pool and testing it with human users in Overcooked. |
| Researcher Affiliation | Academia | Bidipta Sarkar Stanford University bidiptas@stanford.edu Andy Shih Stanford University andyshih@cs.stanford.edu Dorsa Sadigh Stanford University dorsa@cs.stanford.edu |
| Pseudocode | Yes | Pseudocode and specific implementation details for the algorithm incorporating this loss, including the mixed-play buffer generation, is presented in Appendix A. Algorithm 1: Generating Mixed Play Buffer. Algorithm 2: Diverse Conventions with Co Me Di. |
| Open Source Code | Yes | Supplemental videos can be found on our website along with source code and anonymized user study data. |
| Open Datasets | Yes | Supplemental videos can be found on our website along with source code and anonymized user study data. |
| Dataset Splits | No | The paper describes training and testing phases but does not explicitly provide details about training/validation/test dataset splits, such as percentages, sample counts, or predefined split references for any of the environments or the user study. |
| Hardware Specification | Yes | We only used the Intel Xeon Silver 4214R CPU for training in Blind Bandits and Balance Beam. For the Overcooked experiments, we used an additional NVIDIA TITAN RTX, which required around 3 hours per configuration. |
| Software Dependencies | No | The paper mentions key software components like 'Multi-Agent PPO algorithm (MAPPO)', 'Pantheon RL library', 'GPU-accelerated simulation framework', and 'tensorflow-js', but it does not provide specific version numbers for any of these. |
| Experiment Setup | Yes | Table 3: Common hyperparameters for agents in Blind Bandits and Balance Beam. Table 4: Hyperparameters in Blind Bandits. Table 5: Hyperparameters in Balance Beam. Table 6: Hyperparameters in Overcooked (only training set). Table 7: Hyperparameters in Overcooked (convention-aware agents). |