reproducibilityindex.ai

Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Authors: Jiin Woo, Laixi Shi, Gauri Joshi, Yuejie Chi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work explores the benefit of federated learning for offline RL, aiming at collaboratively leveraging offline datasets at multiple agents. Focusing on finitehorizon episodic tabular Markov decision processes (MDPs), we design Fed LCB-Q, a variant of the popular model-free Q-learning algorithm tailored for federated offline RL. Fed LCB-Q updates local Q-functions at agents with novel learning rate schedules and aggregates them at a central server using importance averaging and a carefully designed pessimistic penalty term. Our sample complexity analysis reveals that, with appropriately chosen parameters and synchronization schedules, Fed LCB-Q achieves linear speedup in terms of the number of agents without requiring high-quality datasets at individual agents, as long as the local datasets collectively cover the state-action space visited by the optimal policy, highlighting the power of collaboration in the federated setting.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA 2Department of Computing Mathematical Sciences, California Institute of Technology, CA 91125, USA.
Pseudocode	Yes	Algorithm 1 Federated pessimistic Q-learning (Fed LCB-Q) ... Algorithm 2 Local-Q-learning (agents) ... Algorithm 3 Global-pessimistic-averaging (server)
Open Source Code	No	The paper does not contain any statement or link indicating the availability of its source code.
Open Datasets	No	The paper describes using abstract 'offline datasets' and 'local offline datasets' within a theoretical framework of tabular MDPs, but does not provide details for a specific, publicly available dataset or access information.
Dataset Splits	No	As a theoretical paper, it does not describe experimental data splits for training, validation, or testing.
Hardware Specification	No	As a theoretical paper focused on algorithm design and analysis, it does not mention any hardware specifications used for experiments.
Software Dependencies	No	As a theoretical paper, it does not list any specific software dependencies with version numbers.
Experiment Setup	No	As a theoretical paper, it does not provide details about an experimental setup, such as hyperparameters or training settings.