Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Authors: Jiin Woo, Laixi Shi, Gauri Joshi, Yuejie Chi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work explores the benefit of federated learning for offline RL, aiming at collaboratively leveraging offline datasets at multiple agents. Focusing on finitehorizon episodic tabular Markov decision processes (MDPs), we design Fed LCB-Q, a variant of the popular model-free Q-learning algorithm tailored for federated offline RL. Fed LCB-Q updates local Q-functions at agents with novel learning rate schedules and aggregates them at a central server using importance averaging and a carefully designed pessimistic penalty term. Our sample complexity analysis reveals that, with appropriately chosen parameters and synchronization schedules, Fed LCB-Q achieves linear speedup in terms of the number of agents without requiring high-quality datasets at individual agents, as long as the local datasets collectively cover the state-action space visited by the optimal policy, highlighting the power of collaboration in the federated setting.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA 2Department of Computing Mathematical Sciences, California Institute of Technology, CA 91125, USA.
Pseudocode Yes Algorithm 1 Federated pessimistic Q-learning (Fed LCB-Q) ... Algorithm 2 Local-Q-learning (agents) ... Algorithm 3 Global-pessimistic-averaging (server)
Open Source Code No The paper does not contain any statement or link indicating the availability of its source code.
Open Datasets No The paper describes using abstract 'offline datasets' and 'local offline datasets' within a theoretical framework of tabular MDPs, but does not provide details for a specific, publicly available dataset or access information.
Dataset Splits No As a theoretical paper, it does not describe experimental data splits for training, validation, or testing.
Hardware Specification No As a theoretical paper focused on algorithm design and analysis, it does not mention any hardware specifications used for experiments.
Software Dependencies No As a theoretical paper, it does not list any specific software dependencies with version numbers.
Experiment Setup No As a theoretical paper, it does not provide details about an experimental setup, such as hyperparameters or training settings.