reproducibilityindex.ai

Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis

Authors: Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas Dixon, Been Kim

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments investigate the structure of the learned behavior space, which goes beyond prior works on latent-clustering by identifying relationships between individual agent and joint behaviors. We illustrate that clusters identiﬁed by MOHBA are useful for highlighting similarities and differences in behaviors throughout training. We also quantitatively analyze the completeness of discovered behavior clusters by adopting a modiﬁed version of the concept-discovery framework of Yeh et al. [13] to identify interesting behavior concepts in our multiagent setting. We then test the scalability of our approach by using it for behavioral analysis of several high-dimensional multiagent Mu Jo Co environments [14]. Finally, we evaluate the approach on the open-sourced Open AI hide-and-seek policy checkpoints [10], conﬁrming that the behavioral clusters detected by MOHBA closely match those of the human-expert annotated labels provided in their policy checkpoints.
Researcher Affiliation	Industry	Shayegan Omidshaﬁei somidshafiei@google.com Andrei Kapishnikov kapishnikov@google.com Yannick Assogba yassogba@google.com Lucas Dixon ldixon@google.com Been Kim beenkim@google.com Google Research
Pseudocode	Yes	Appendix A.7 provides pseudocode.
Open Source Code	No	The paper states: “We provide details for experiment reproducibility in Appendix A.2. We also include model highlevel code in Appendix A.7.” Appendix A.7 contains pseudocode, not runnable open-source code for the methodology, and no external link is provided.
Open Datasets	Yes	Finally, we evaluate the approach on the open-sourced Open AI hide-and-seek policy checkpoints [10], conﬁrming that the behavioral clusters detected by MOHBA closely match those of the human-expert annotated labels provided in their policy checkpoints.
Dataset Splits	Yes	We create an 80-20 train-validation split, then train a 2-layer (8 hidden units each) MLP g via a softmax-cross entropy loss to predict the classes using only zω as input (rather than the actual trajectory τ).
Hardware Specification	No	The paper states “We provide all computational details in Appendix A.2.”, but Appendix A.2 does not specify hardware details such as GPU/CPU models or types of compute resources used.
Software Dependencies	No	The paper mentions software like “Acme RL library”, “TD3 algorithm”, “RLDS”, “PyTorch”, and “Adam optimizer” but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Appendix A.2 provides hyperparameters such as ADAM optimizer with a learning rate of 1e-4, batch size of 256, latent dimensions for zω and zα of 2 and 4, respectively, and β values of 0.05 and 0.01 for the hill-climbing and coordination game, and 0.005 for the Half Cheetah and Ant domains.