The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

Authors: Jiin Woo, Gauri Joshi, Yuejie Chi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct numerical experiments to demonstrate the performance of the asynchronous Q-learning algorithms (Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg). Experimental setup. Consider an MDP M = (S, A, P, r,  g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. Figure 3 shows the normalized Q-estimate error (1  g) QT Q with respect to the sample size T, with K = 20 and   = 50.
Researcher Affiliation Academia Jiin Woo 1 Gauri Joshi 1 Yuejie Chi 1 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. Correspondence to: Jiin Woo <jiinw@andrew.cmu.edu>.
Pseudocode Yes Algorithm 1 Federated Sync. Q-learning (Fed Syn Q). Algorithm 2 Federated Async. Q-learning (Fed Asyn Q).
Open Source Code No No statement providing concrete access to source code for the described methodology was found.
Open Datasets No Experimental setup. Consider an MDP M = (S, A, P, r,  g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. The reward function r is set as r(s = 1, a) = 1 and r(s = 0, a) = 0 for any action a A, and the discount factor is set as  g = 0.9. We now describe the transition kernel P. (This describes a constructed synthetic environment, not an existing public dataset.)
Dataset Splits No We run the algorithms for 100 simulations using samples randomly generated from the MDP and policies assigned to the agents. (No explicit mention of training, validation, or test splits.)
Hardware Specification No No specific hardware details (GPU/CPU models, memory, etc.) used for running experiments were mentioned in the paper.
Software Dependencies No No specific ancillary software details, such as library names with version numbers, were provided.
Experiment Setup Yes The Q-function is initialized with entries uniformly at random from (0, 1 1  g ] for each state-action pair. the learning rates of Fed Asyn Q-Im Avg and Fed Asyn Q-Eq Avg are set as  = 0.05 and  = 0.2, where each algorithm converges to the same error floor at the fastest speed, respectively. (Also refers to K=20 and  =50 in figure captions, indicating specific settings.)