Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Authors: Jiin Woo, Laixi Shi, Gauri Joshi, Yuejie Chi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work explores the benefit of federated learning for offline RL, aiming at collaboratively leveraging offline datasets at multiple agents. Focusing on finitehorizon episodic tabular Markov decision processes (MDPs), we design Fed LCB-Q, a variant of the popular model-free Q-learning algorithm tailored for federated offline RL. Fed LCB-Q updates local Q-functions at agents with novel learning rate schedules and aggregates them at a central server using importance averaging and a carefully designed pessimistic penalty term. Our sample complexity analysis reveals that, with appropriately chosen parameters and synchronization schedules, Fed LCB-Q achieves linear speedup in terms of the number of agents without requiring high-quality datasets at individual agents, as long as the local datasets collectively cover the state-action space visited by the optimal policy, highlighting the power of collaboration in the federated setting. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA 2Department of Computing Mathematical Sciences, California Institute of Technology, CA 91125, USA. |
| Pseudocode | Yes | Algorithm 1 Federated pessimistic Q-learning (Fed LCB-Q) ... Algorithm 2 Local-Q-learning (agents) ... Algorithm 3 Global-pessimistic-averaging (server) |
| Open Source Code | No | The paper does not contain any statement or link indicating the availability of its source code. |
| Open Datasets | No | The paper describes using abstract 'offline datasets' and 'local offline datasets' within a theoretical framework of tabular MDPs, but does not provide details for a specific, publicly available dataset or access information. |
| Dataset Splits | No | As a theoretical paper, it does not describe experimental data splits for training, validation, or testing. |
| Hardware Specification | No | As a theoretical paper focused on algorithm design and analysis, it does not mention any hardware specifications used for experiments. |
| Software Dependencies | No | As a theoretical paper, it does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | As a theoretical paper, it does not provide details about an experimental setup, such as hyperparameters or training settings. |