The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
Authors: Jiin Woo, Gauri Joshi, Yuejie Chi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct numerical experiments to demonstrate the performance of the asynchronous Q-learning algorithms (Fed Asyn Q-Eq Avg and Fed Asyn Q-Im Avg). Experimental setup. Consider an MDP M = (S, A, P, r, g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. Figure 3 shows the normalized Q-estimate error (1 g) QT Q with respect to the sample size T, with K = 20 and = 50. |
| Researcher Affiliation | Academia | Jiin Woo 1 Gauri Joshi 1 Yuejie Chi 1 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. Correspondence to: Jiin Woo <jiinw@andrew.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Federated Sync. Q-learning (Fed Syn Q). Algorithm 2 Federated Async. Q-learning (Fed Asyn Q). |
| Open Source Code | No | No statement providing concrete access to source code for the described methodology was found. |
| Open Datasets | No | Experimental setup. Consider an MDP M = (S, A, P, r, g) described in Figure 2, where S = {0, 1} and A = {1, 2, , m}. The reward function r is set as r(s = 1, a) = 1 and r(s = 0, a) = 0 for any action a A, and the discount factor is set as g = 0.9. We now describe the transition kernel P. (This describes a constructed synthetic environment, not an existing public dataset.) |
| Dataset Splits | No | We run the algorithms for 100 simulations using samples randomly generated from the MDP and policies assigned to the agents. (No explicit mention of training, validation, or test splits.) |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, etc.) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific ancillary software details, such as library names with version numbers, were provided. |
| Experiment Setup | Yes | The Q-function is initialized with entries uniformly at random from (0, 1 1 g ] for each state-action pair. the learning rates of Fed Asyn Q-Im Avg and Fed Asyn Q-Eq Avg are set as = 0.05 and = 0.2, where each algorithm converges to the same error floor at the fastest speed, respectively. (Also refers to K=20 and =50 in figure captions, indicating specific settings.) |