Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling
Authors: Sajad Khodadadian, Pranay Sharma, Gauri Joshi, Siva Theja Maguluri
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose federated versions of on-policy TD, off-policy TD and Q-learning, and analyze their convergence. For all these algorithms, to the best of our knowledge, we are the ο¬rst to consider Markovian noise and multiple local updates, and prove a linear convergence speedup with respect to the number of agents. |
| Researcher Affiliation | Academia | 1H. Milton Stewart School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA 2Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA |
| Pseudocode | Yes | Algorithm 1 Federated n-step TD (On-policy, Function Approx.), Algorithm 2 Federated n-step TD (Off-policy Tabular Setting), Algorithm 3 Federated Q-learning, Algorithm 4 Federated Stochastic Approximation with Markovian Noise (Fed SAM) |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical analysis and proofs, thus it does not use or specify any public or open datasets for training or evaluation. |
| Dataset Splits | No | This paper is purely theoretical and does not involve experimental validation with datasets, so no training/validation/test splits are specified. |
| Hardware Specification | No | This paper is theoretical and does not describe empirical experiments, therefore no specific hardware specifications for running experiments are provided. |
| Software Dependencies | No | This paper is theoretical and does not describe empirical experiments, therefore no specific software dependencies with version numbers are listed. |
| Experiment Setup | No | This paper is theoretical and focuses on mathematical analysis and proofs rather than empirical experiments, so it does not include details on experimental setup or hyperparameters. |