Iteratively Learn Diverse Strategies with State Distance Information

Authors: Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, YI WU

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines.
Researcher Affiliation Academia Wei Fu1Z, Weihua Du 1, Jingwei Li 1, Sunli Chen1, Jingzhao Zhang12, Yi Wu12 1 IIIS, Tsinghua University, 2 Shanghai Qi Zhi Institute Z fuwth17@gmail.com, \jxwuyi@gmail.com
Pseudocode Yes The pseudocode of SIPO can be found in App. G.
Open Source Code No Explanation: The paper mentions a project website for 'GIF demonstrations' but does not explicitly state that the source code for SIPO or its methodology is publicly released or provide a direct link to a code repository for their specific implementation.
Open Datasets Yes We use the Humanoid environment in Isaac Gym [42]... We adopt the SMAC environment in the MAPPO codebase2... We adopt the simple115v2 representation as observation [for GRF].
Dataset Splits No Explanation: The paper describes training processes and evaluation metrics for different environments but does not explicitly specify dataset splits (e.g., percentages or sample counts for training, validation, and testing).
Hardware Specification Yes All algorithms run for the same number of environment frames on a desktop machine with an RTX3090 GPU.
Software Dependencies No Explanation: The paper states that its implementation is based on 'MAPPO [69]' and mentions other baselines, but it does not provide specific version numbers for any key software components or libraries required for reproduction.
Experiment Setup Yes Table 15: Hyperparameters in the 2D navigation environment. Table 16: Common hyperparameters for SIPO, baselines, and ablations. Table 17: SIPO hyperparameters across all environments.