Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Iteratively Learn Diverse Strategies with State Distance Information
Authors: Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, YI WU
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines. |
| Researcher Affiliation | Academia | Wei Fu1Z, Weihua Du 1, Jingwei Li 1, Sunli Chen1, Jingzhao Zhang12, Yi Wu12 1 IIIS, Tsinghua University, 2 Shanghai Qi Zhi Institute Z EMAIL, \EMAIL |
| Pseudocode | Yes | The pseudocode of SIPO can be found in App. G. |
| Open Source Code | No | Explanation: The paper mentions a project website for 'GIF demonstrations' but does not explicitly state that the source code for SIPO or its methodology is publicly released or provide a direct link to a code repository for their specific implementation. |
| Open Datasets | Yes | We use the Humanoid environment in Isaac Gym [42]... We adopt the SMAC environment in the MAPPO codebase2... We adopt the simple115v2 representation as observation [for GRF]. |
| Dataset Splits | No | Explanation: The paper describes training processes and evaluation metrics for different environments but does not explicitly specify dataset splits (e.g., percentages or sample counts for training, validation, and testing). |
| Hardware Specification | Yes | All algorithms run for the same number of environment frames on a desktop machine with an RTX3090 GPU. |
| Software Dependencies | No | Explanation: The paper states that its implementation is based on 'MAPPO [69]' and mentions other baselines, but it does not provide specific version numbers for any key software components or libraries required for reproduction. |
| Experiment Setup | Yes | Table 15: Hyperparameters in the 2D navigation environment. Table 16: Common hyperparameters for SIPO, baselines, and ablations. Table 17: SIPO hyperparameters across all environments. |