Recursive Reasoning Graph for Multi-Agent Reinforcement Learning
Authors: Xiaobai Ma, David Isele, Jayesh K. Gupta, Kikuo Fujimura, Mykel J. Kochenderfer7664-7671
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed algorithm, referred to as the Recursive Reasoning Graph (R2G), shows state-of-the-art performance on multiple multi-agent particle and robotics games. |
| Researcher Affiliation | Collaboration | Xiaobai Ma1, David Isele2, Jayesh K. Gupta1, Kikuo Fujimura2, Mykel J. Kochenderfer1 1Stanford University 2Honda Research Institute US |
| Pseudocode | Yes | Algorithm 1: Recursive Reasoning Graph (R2G) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | Particle World (Lowe et al. 2017) and Robo Sumo (Al Shedivat et al. 2018) are cited, indicating the use of established environments/benchmarks, which typically imply publicly available data or simulators. |
| Dataset Splits | No | The paper describes experiments conducted within multi-agent simulation environments (Particle World, Robo Sumo) rather than on fixed datasets with traditional train/validation/test splits. Therefore, it does not provide explicit dataset split percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions) that would be needed for reproducibility. |
| Experiment Setup | No | While the paper mentions aspects like '5 random seeds' for evaluation and discusses the experimental environments, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations within the main text. |