Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Agent Learning from Learners

Authors: Mine Melodi Caliskan, Francesco Chini, Setareh Maghsudi

ICML 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically test MA-Lf L and we observe high positive correlation between the recovered reward functions and the ground truth. We test MA-Lf L experimentally in a 3 3 deterministic grid world environments.
Researcher Affiliation Academia 1Department of Computer Science, University of Tuebingen, T ubingen, Germany.
Pseudocode Yes Algorithm 1 Multi-agent Soft Policy Iteration (MA-SPI) ... Algorithm 2 Multi-agent Learning from a Learner (MALf L)
Open Source Code Yes the source code is available at Git Hub 1. 1https://github.com/melodi Cyb/multiagent-learning-from-learners
Open Datasets No We test MA-Lf L experimentally in a 3 3 deterministic grid world environments. ... The paper does not provide access information or citations for this grid world environment/dataset.
Dataset Splits No The paper mentions running experiments in a 3x3 grid world environment but does not specify any training, validation, or test dataset splits.
Hardware Specification Yes We execute all experiments under a Conda environment using Python with a computation unit GPU-2080i
Software Dependencies No We execute all experiments under a Conda environment using Python with a computation unit GPU-2080i. The paper mentions "Python" but does not specify a version or any other software dependencies with version numbers.
Experiment Setup Yes Table 3. Parameters to reproduce results for MA-Lf L in Grid World scenario in Section 7 Table 1. This table includes specific parameter values such as Alpha 3, Beta 0.1, Gamma 0.9, Episode Length 1000, Iteration # 10, Episode # 3000, Entropy Coefficient 0.3, Adam Learning Rate 0.1, Adam Epoch # 10, Reward Adam Epoch # 1000, Reward Adam Learning Rate 0.01.