An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning
Authors: Woojun Kim, Youngchul Sung
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed scheme significantly outperforms current state-of-the-art multi-agent RL algorithms. and 4. Experiments Here, we provide numerical results and ablation studies. |
| Researcher Affiliation | Academia | 1School of Electrical Engineering, KAIST, Daejeon 34141, Republic of Korea. |
| Pseudocode | Yes | Algorithm 1 ADaptive Entropy-Regularization for multi-agent reinforcement learning (ADER) |
| Open Source Code | Yes | The source code is available at https: //github.com/wjkim1202/ader. |
| Open Datasets | Yes | Multi-agent Half Cheetah (Peng et al., 2021), Heterogeneous Predator-Prey (HPP), Starcraft II micromanagement benchmark (SMAC) environment (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper uses standard benchmark environments (Half Cheetah, HPP, SMAC) but does not explicitly provide details on train/validation/test dataset splits, such as specific percentages, sample counts, or clear partitioning methodology for reproducibility. |
| Hardware Specification | Yes | We conducted the experiments on a server with Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz and 8 Nvidia Titan xp GPUs. |
| Software Dependencies | No | The paper mentions software like Pytorch and specific environments (SMAC, GRF) but does not provide specific version numbers for any software libraries or dependencies used for implementation. |
| Experiment Setup | Yes | We use an MLP with 2 hidden layers which have 400 and 300 hidden units and Re LU activation functions. The replay buffer stores up to 10^6 transitions and 100 transitions are uniformly sampled for training. We set the hyperparameter for EMA filter as ξ = 0.9 and initialize the temperature parameters as αi init = e^ 2 for all i N. and Table 1 with specific hyperparameters for SMAC. |