Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity
Authors: Weichao Mao, Haoran Qiu, Chen Wang, Hubertus Franke, Zbigniew Kalbarczyk, Ravishankar Iyer, Tamer Basar
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings... We further provide numerical simulations to corroborate our theoretical findings. |
| Researcher Affiliation | Collaboration | Weichao Mao University of Illinois Urbana-Champaign EMAIL Haoran Qiu University of Illinois Urbana-Champaign EMAIL Chen Wang IBM Research EMAIL Hubertus Franke IBM Research EMAIL Zbigniew Kalbarczyk University of Illinois Urbana-Champaign EMAIL Ravishankar K. Iyer University of Illinois Urbana-Champaign EMAIL Tamer Basar University of Illinois Urbana-Champaign EMAIL |
| Pseudocode | Yes | Algorithm 1: Optimistic Online Mirror Descent for Zero-Sum Markov Games |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | We numerically evaluate our meta-learning algorithms from Sections 3 and 4 on a sequence of K games. In this section, we evaluate on a sequence of K = 10 zero-sum Markov games and Markov potential games... We generate the K = 10 games by first specifying a base game and then adding random perturbations to its reward function to get K slightly different games. The paper uses synthetically generated data/games and does not provide access information for a publicly available dataset. |
| Dataset Splits | No | The paper describes generating and running simulations on K=10 games for T=1000 iterations each. It does not mention conventional training/validation/test dataset splits, but rather evaluates the performance of the algorithm across a sequence of tasks. |
| Hardware Specification | No | The paper mentions 'numerical simulations' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We evaluate on a sequence of K = 10 zero-sum Markov games and Markov potential games with two states, two players, and two candidate actions for each player. Each of the K games is run for T = 1000 iterations. ... (5) with α = 1/\sqrt{K} as the meta-updates... Theorem 1. If Algorithm 1 is run on a two-player zero-sum Markov game for T iterations with a learning rate η ≤ 1/(8H^2)... Proposition 1. ... with α ≤ (1-γ)^4 / (8κ^3NAmax). |