Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity

Authors: Weichao Mao, Haoran Qiu, Chen Wang, Hubertus Franke, Zbigniew Kalbarczyk, Ravishankar Iyer, Tamer Basar

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings... We further provide numerical simulations to corroborate our theoretical findings.
Researcher Affiliation Collaboration Weichao Mao University of Illinois Urbana-Champaign weichao2@illinois.edu Haoran Qiu University of Illinois Urbana-Champaign haoranq4@illinois.edu Chen Wang IBM Research chen.wang1@ibm.com Hubertus Franke IBM Research frankeh@us.ibm.com Zbigniew Kalbarczyk University of Illinois Urbana-Champaign kalbarcz@illinois.edu Ravishankar K. Iyer University of Illinois Urbana-Champaign rkiyer@illinois.edu Tamer Basar University of Illinois Urbana-Champaign basar1@illinois.edu
Pseudocode Yes Algorithm 1: Optimistic Online Mirror Descent for Zero-Sum Markov Games
Open Source Code No The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No We numerically evaluate our meta-learning algorithms from Sections 3 and 4 on a sequence of K games. In this section, we evaluate on a sequence of K = 10 zero-sum Markov games and Markov potential games... We generate the K = 10 games by first specifying a base game and then adding random perturbations to its reward function to get K slightly different games. The paper uses synthetically generated data/games and does not provide access information for a publicly available dataset.
Dataset Splits No The paper describes generating and running simulations on K=10 games for T=1000 iterations each. It does not mention conventional training/validation/test dataset splits, but rather evaluates the performance of the algorithm across a sequence of tasks.
Hardware Specification No The paper mentions 'numerical simulations' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup Yes We evaluate on a sequence of K = 10 zero-sum Markov games and Markov potential games with two states, two players, and two candidate actions for each player. Each of the K games is run for T = 1000 iterations. ... (5) with α = 1/\sqrt{K} as the meta-updates... Theorem 1. If Algorithm 1 is run on a two-player zero-sum Markov game for T iterations with a learning rate η ≤ 1/(8H^2)... Proposition 1. ... with α ≤ (1-γ)^4 / (8κ^3NAmax).