reproducibilityindex.ai

The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

Authors: Jiin Woo, Gauri Joshi, Yuejie Chi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct numerical experiments to demonstrate the performance of the asynchronous Q-learning algorithms (Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg). Experimental setup. Consider an MDP M = (S, A, P, r, g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. Figure 3 shows the normalized Q-estimate error (1 g) QT Q with respect to the sample size T, with K = 20 and = 50.
Researcher Affiliation	Academia	Jiin Woo 1 Gauri Joshi 1 Yuejie Chi 1 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. Correspondence to: Jiin Woo <jiinw@andrew.cmu.edu>.
Pseudocode	Yes	Algorithm 1 Federated Sync. Q-learning (Fed Syn Q). Algorithm 2 Federated Async. Q-learning (Fed Asyn Q).
Open Source Code	No	No statement providing concrete access to source code for the described methodology was found.
Open Datasets	No	Experimental setup. Consider an MDP M = (S, A, P, r, g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. The reward function r is set as r(s = 1, a) = 1 and r(s = 0, a) = 0 for any action a A, and the discount factor is set as g = 0.9. We now describe the transition kernel P. (This describes a constructed synthetic environment, not an existing public dataset.)
Dataset Splits	No	We run the algorithms for 100 simulations using samples randomly generated from the MDP and policies assigned to the agents. (No explicit mention of training, validation, or test splits.)
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, etc.) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific ancillary software details, such as library names with version numbers, were provided.
Experiment Setup	Yes	The Q-function is initialized with entries uniformly at random from (0, 1 1 g ] for each state-action pair. the learning rates of Fed Asyn Q-Im Avg and Fed Asyn Q-Eq Avg are set as = 0.05 and = 0.2, where each algorithm converges to the same error floor at the fastest speed, respectively. (Also refers to K=20 and =50 in figure captions, indicating specific settings.)