reproducibilityindex.ai

Learning to Teach in Cooperative Multiagent Reinforcement Learning

Authors: Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, Jonathan P. How6128-6136

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn signiﬁcantly faster, but also learn to coordinate in tasks where existing methods fail. We conduct empirical evaluations on a sequence of increasingly challenging domains involving two agents.
Researcher Affiliation	Collaboration	Shayegan Omidshaﬁei1,2 shayegan@mit.edu Dong-Ki Kim1,2 dkkim93@mit.edu Miao Liu2,3 miao.liu1@ibm.com Gerald Tesauro2,3 gtesauro@us.ibm.com Matthew Riemer2,3 mdriemer@us.ibm.com Christopher Amato4 camato@ccs.neu.edu Murray Campbell2,3 mcam@us.ibm.com Jonathan P. How1,2 jhow@mit.edu 1LIDS, MIT 2MIT-IBM Watson AI Lab 3IBM Research 4CCIS, Northeastern University
Pseudocode	Yes	Pseudocode is presented in Algorithm 2. Algorithm 1 Get advising-level observations Algorithm 2 Le CTR Algorithm
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets	No	The paper describes custom environments ('Repeated game', 'Hallway', 'Room game') but does not provide any links, DOIs, repositories, or formal citations for public access to these datasets or environments.
Dataset Splits	No	The paper mentions running '50 independent trials' and 'task-level learning iterations' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using Q-learners, tile-coded policies, neural networks, and an actor-critic approach, but does not provide specific software dependencies or library version numbers required to replicate the experiments.
Experiment Setup	No	Refer to the supplementary material for hyperparameters.