Iteratively Learn Diverse Strategies with State Distance Information
Authors: Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, YI WU
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines. |
| Researcher Affiliation | Academia | Wei Fu1Z, Weihua Du 1, Jingwei Li 1, Sunli Chen1, Jingzhao Zhang12, Yi Wu12 1 IIIS, Tsinghua University, 2 Shanghai Qi Zhi Institute Z fuwth17@gmail.com, \jxwuyi@gmail.com |
| Pseudocode | Yes | The pseudocode of SIPO can be found in App. G. |
| Open Source Code | No | Explanation: The paper mentions a project website for 'GIF demonstrations' but does not explicitly state that the source code for SIPO or its methodology is publicly released or provide a direct link to a code repository for their specific implementation. |
| Open Datasets | Yes | We use the Humanoid environment in Isaac Gym [42]... We adopt the SMAC environment in the MAPPO codebase2... We adopt the simple115v2 representation as observation [for GRF]. |
| Dataset Splits | No | Explanation: The paper describes training processes and evaluation metrics for different environments but does not explicitly specify dataset splits (e.g., percentages or sample counts for training, validation, and testing). |
| Hardware Specification | Yes | All algorithms run for the same number of environment frames on a desktop machine with an RTX3090 GPU. |
| Software Dependencies | No | Explanation: The paper states that its implementation is based on 'MAPPO [69]' and mentions other baselines, but it does not provide specific version numbers for any key software components or libraries required for reproduction. |
| Experiment Setup | Yes | Table 15: Hyperparameters in the 2D navigation environment. Table 16: Common hyperparameters for SIPO, baselines, and ablations. Table 17: SIPO hyperparameters across all environments. |