Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity
Authors: Weichao Mao, Haoran Qiu, Chen Wang, Hubertus Franke, Zbigniew Kalbarczyk, Ravishankar Iyer, Tamer Basar
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings... We further provide numerical simulations to corroborate our theoretical findings. |
| Researcher Affiliation | Collaboration | Weichao Mao University of Illinois Urbana-Champaign weichao2@illinois.edu Haoran Qiu University of Illinois Urbana-Champaign haoranq4@illinois.edu Chen Wang IBM Research chen.wang1@ibm.com Hubertus Franke IBM Research frankeh@us.ibm.com Zbigniew Kalbarczyk University of Illinois Urbana-Champaign kalbarcz@illinois.edu Ravishankar K. Iyer University of Illinois Urbana-Champaign rkiyer@illinois.edu Tamer Basar University of Illinois Urbana-Champaign basar1@illinois.edu |
| Pseudocode | Yes | Algorithm 1: Optimistic Online Mirror Descent for Zero-Sum Markov Games |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | We numerically evaluate our meta-learning algorithms from Sections 3 and 4 on a sequence of K games. In this section, we evaluate on a sequence of K = 10 zero-sum Markov games and Markov potential games... We generate the K = 10 games by first specifying a base game and then adding random perturbations to its reward function to get K slightly different games. The paper uses synthetically generated data/games and does not provide access information for a publicly available dataset. |
| Dataset Splits | No | The paper describes generating and running simulations on K=10 games for T=1000 iterations each. It does not mention conventional training/validation/test dataset splits, but rather evaluates the performance of the algorithm across a sequence of tasks. |
| Hardware Specification | No | The paper mentions 'numerical simulations' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We evaluate on a sequence of K = 10 zero-sum Markov games and Markov potential games with two states, two players, and two candidate actions for each player. Each of the K games is run for T = 1000 iterations. ... (5) with α = 1/\sqrt{K} as the meta-updates... Theorem 1. If Algorithm 1 is run on a two-player zero-sum Markov game for T iterations with a learning rate η ≤ 1/(8H^2)... Proposition 1. ... with α ≤ (1-γ)^4 / (8κ^3NAmax). |