Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning
Authors: Woojun Kim, Youngchul Sung
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed scheme significantly outperforms current state-of-the-art multi-agent RL algorithms. and 4. Experiments Here, we provide numerical results and ablation studies. |
| Researcher Affiliation | Academia | 1School of Electrical Engineering, KAIST, Daejeon 34141, Republic of Korea. |
| Pseudocode | Yes | Algorithm 1 ADaptive Entropy-Regularization for multi-agent reinforcement learning (ADER) |
| Open Source Code | Yes | The source code is available at https: //github.com/wjkim1202/ader. |
| Open Datasets | Yes | Multi-agent Half Cheetah (Peng et al., 2021), Heterogeneous Predator-Prey (HPP), Starcraft II micromanagement benchmark (SMAC) environment (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper uses standard benchmark environments (Half Cheetah, HPP, SMAC) but does not explicitly provide details on train/validation/test dataset splits, such as specific percentages, sample counts, or clear partitioning methodology for reproducibility. |
| Hardware Specification | Yes | We conducted the experiments on a server with Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz and 8 Nvidia Titan xp GPUs. |
| Software Dependencies | No | The paper mentions software like Pytorch and specific environments (SMAC, GRF) but does not provide specific version numbers for any software libraries or dependencies used for implementation. |
| Experiment Setup | Yes | We use an MLP with 2 hidden layers which have 400 and 300 hidden units and Re LU activation functions. The replay buffer stores up to 10^6 transitions and 100 transitions are uniformly sampled for training. We set the hyperparameter for EMA filter as ΞΎ = 0.9 and initialize the temperature parameters as Ξ±i init = e^ 2 for all i N. and Table 1 with specific hyperparameters for SMAC. |