Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
Authors: Jiin Woo, Gauri Joshi, Yuejie Chi
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct numerical experiments to demonstrate the performance of the asynchronous Q-learning algorithms (Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg). Experimental setup. Consider an MDP M = (S, A, P, r, g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. Figure 3 shows the normalized Q-estimate error (1 g) QT Q with respect to the sample size T, with K = 20 and = 50. |
| Researcher Affiliation | Academia | Jiin Woo 1 Gauri Joshi 1 Yuejie Chi 1 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. Correspondence to: Jiin Woo <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Federated Sync. Q-learning (Fed Syn Q). Algorithm 2 Federated Async. Q-learning (Fed Asyn Q). |
| Open Source Code | No | No statement providing concrete access to source code for the described methodology was found. |
| Open Datasets | No | Experimental setup. Consider an MDP M = (S, A, P, r, g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. The reward function r is set as r(s = 1, a) = 1 and r(s = 0, a) = 0 for any action a A, and the discount factor is set as g = 0.9. We now describe the transition kernel P. (This describes a constructed synthetic environment, not an existing public dataset.) |
| Dataset Splits | No | We run the algorithms for 100 simulations using samples randomly generated from the MDP and policies assigned to the agents. (No explicit mention of training, validation, or test splits.) |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, etc.) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific ancillary software details, such as library names with version numbers, were provided. |
| Experiment Setup | Yes | The Q-function is initialized with entries uniformly at random from (0, 1 1 g ] for each state-action pair. the learning rates of Fed Asyn Q-Im Avg and Fed Asyn Q-Eq Avg are set as = 0.05 and = 0.2, where each algorithm converges to the same error floor at the fastest speed, respectively. (Also refers to K=20 and =50 in figure captions, indicating specific settings.) |