Multi-Agent Learning from Learners
Authors: Mine Melodi Caliskan, Francesco Chini, Setareh Maghsudi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically test MA-Lf L and we observe high positive correlation between the recovered reward functions and the ground truth. We test MA-Lf L experimentally in a 3 3 deterministic grid world environments. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Tuebingen, T ubingen, Germany. |
| Pseudocode | Yes | Algorithm 1 Multi-agent Soft Policy Iteration (MA-SPI) ... Algorithm 2 Multi-agent Learning from a Learner (MALf L) |
| Open Source Code | Yes | the source code is available at Git Hub 1. 1https://github.com/melodi Cyb/multiagent-learning-from-learners |
| Open Datasets | No | We test MA-Lf L experimentally in a 3 3 deterministic grid world environments. ... The paper does not provide access information or citations for this grid world environment/dataset. |
| Dataset Splits | No | The paper mentions running experiments in a 3x3 grid world environment but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | Yes | We execute all experiments under a Conda environment using Python with a computation unit GPU-2080i |
| Software Dependencies | No | We execute all experiments under a Conda environment using Python with a computation unit GPU-2080i. The paper mentions "Python" but does not specify a version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Table 3. Parameters to reproduce results for MA-Lf L in Grid World scenario in Section 7 Table 1. This table includes specific parameter values such as Alpha 3, Beta 0.1, Gamma 0.9, Episode Length 1000, Iteration # 10, Episode # 3000, Entropy Coefficient 0.3, Adam Learning Rate 0.1, Adam Epoch # 10, Reward Adam Epoch # 1000, Reward Adam Learning Rate 0.01. |