Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Teach in Cooperative Multiagent Reinforcement Learning
Authors: Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, Jonathan P. How6128-6136
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail. We conduct empirical evaluations on a sequence of increasingly challenging domains involving two agents. |
| Researcher Affiliation | Collaboration | Shayegan Omidshafiei1,2 EMAIL Dong-Ki Kim1,2 EMAIL Miao Liu2,3 EMAIL Gerald Tesauro2,3 EMAIL Matthew Riemer2,3 EMAIL Christopher Amato4 EMAIL Murray Campbell2,3 EMAIL Jonathan P. How1,2 EMAIL 1LIDS, MIT 2MIT-IBM Watson AI Lab 3IBM Research 4CCIS, Northeastern University |
| Pseudocode | Yes | Pseudocode is presented in Algorithm 2. Algorithm 1 Get advising-level observations Algorithm 2 Le CTR Algorithm |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper describes custom environments ('Repeated game', 'Hallway', 'Room game') but does not provide any links, DOIs, repositories, or formal citations for public access to these datasets or environments. |
| Dataset Splits | No | The paper mentions running '50 independent trials' and 'task-level learning iterations' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Q-learners, tile-coded policies, neural networks, and an actor-critic approach, but does not provide specific software dependencies or library version numbers required to replicate the experiments. |
| Experiment Setup | No | Refer to the supplementary material for hyperparameters. |