Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
Authors: Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas Dixon, Been Kim
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments investigate the structure of the learned behavior space, which goes beyond prior works on latent-clustering by identifying relationships between individual agent and joint behaviors. We illustrate that clusters identified by MOHBA are useful for highlighting similarities and differences in behaviors throughout training. We also quantitatively analyze the completeness of discovered behavior clusters by adopting a modified version of the concept-discovery framework of Yeh et al. [13] to identify interesting behavior concepts in our multiagent setting. We then test the scalability of our approach by using it for behavioral analysis of several high-dimensional multiagent Mu Jo Co environments [14]. Finally, we evaluate the approach on the open-sourced Open AI hide-and-seek policy checkpoints [10], confirming that the behavioral clusters detected by MOHBA closely match those of the human-expert annotated labels provided in their policy checkpoints. |
| Researcher Affiliation | Industry | Shayegan Omidshafiei somidshafiei@google.com Andrei Kapishnikov kapishnikov@google.com Yannick Assogba yassogba@google.com Lucas Dixon ldixon@google.com Been Kim beenkim@google.com Google Research |
| Pseudocode | Yes | Appendix A.7 provides pseudocode. |
| Open Source Code | No | The paper states: “We provide details for experiment reproducibility in Appendix A.2. We also include model highlevel code in Appendix A.7.” Appendix A.7 contains pseudocode, not runnable open-source code for the methodology, and no external link is provided. |
| Open Datasets | Yes | Finally, we evaluate the approach on the open-sourced Open AI hide-and-seek policy checkpoints [10], confirming that the behavioral clusters detected by MOHBA closely match those of the human-expert annotated labels provided in their policy checkpoints. |
| Dataset Splits | Yes | We create an 80-20 train-validation split, then train a 2-layer (8 hidden units each) MLP g via a softmax-cross entropy loss to predict the classes using only zω as input (rather than the actual trajectory τ). |
| Hardware Specification | No | The paper states “We provide all computational details in Appendix A.2.”, but Appendix A.2 does not specify hardware details such as GPU/CPU models or types of compute resources used. |
| Software Dependencies | No | The paper mentions software like “Acme RL library”, “TD3 algorithm”, “RLDS”, “PyTorch”, and “Adam optimizer” but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Appendix A.2 provides hyperparameters such as ADAM optimizer with a learning rate of 1e-4, batch size of 256, latent dimensions for zω and zα of 2 and 4, respectively, and β values of 0.05 and 0.01 for the hill-climbing and coordination game, and 0.005 for the Half Cheetah and Ant domains. |