Learning to Teach in Cooperative Multiagent Reinforcement Learning
Authors: Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, Jonathan P. How6128-6136
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail. We conduct empirical evaluations on a sequence of increasingly challenging domains involving two agents. |
| Researcher Affiliation | Collaboration | Shayegan Omidshafiei1,2 shayegan@mit.edu Dong-Ki Kim1,2 dkkim93@mit.edu Miao Liu2,3 miao.liu1@ibm.com Gerald Tesauro2,3 gtesauro@us.ibm.com Matthew Riemer2,3 mdriemer@us.ibm.com Christopher Amato4 camato@ccs.neu.edu Murray Campbell2,3 mcam@us.ibm.com Jonathan P. How1,2 jhow@mit.edu 1LIDS, MIT 2MIT-IBM Watson AI Lab 3IBM Research 4CCIS, Northeastern University |
| Pseudocode | Yes | Pseudocode is presented in Algorithm 2. Algorithm 1 Get advising-level observations Algorithm 2 Le CTR Algorithm |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper describes custom environments ('Repeated game', 'Hallway', 'Room game') but does not provide any links, DOIs, repositories, or formal citations for public access to these datasets or environments. |
| Dataset Splits | No | The paper mentions running '50 independent trials' and 'task-level learning iterations' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Q-learners, tile-coded policies, neural networks, and an actor-critic approach, but does not provide specific software dependencies or library version numbers required to replicate the experiments. |
| Experiment Setup | No | Refer to the supplementary material for hyperparameters. |