Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Federated Q-Learning: Linear Regret Speedup with Low Communication Cost
Authors: Zhong Zheng, Fengyu Gao, Lingzhou Xue, Jing Yang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments in a synthetic environment to validate the theoretical performances of Fed Q-Hoeffding, Fed Q-Beinstein, and compare with their single-user counterparts UCB-H and UCB-B (Jin et al., 2018), respectively. |
| Researcher Affiliation | Academia | Zhong Zheng, Fengyu Gao, Lingzhou Xue & Jing Yang The Pennsylvania State University EMAIL |
| Pseudocode | Yes | Algorithms 1 and 2 formally present the Hoeffding-type design. [...] Algorithm 1 Fed Q-Hoeffding (Central Server) [...] Algorithm 2 Fed Q-Hoeffding (Agent m in round k) |
| Open Source Code | Yes | Numerical experiments in this paper can be fully reproduced via the publicly available code3. 3https://openreview.net/attachment?id=fe6ANBxcKM&name=supplementary_material |
| Open Datasets | No | In this section, we conduct experiments in a synthetic environment to validate the theoretical performances of Fed Q-Hoeffding, Fed Q-Beinstein, and compare with their single-user counterparts UCB-H and UCB-B (Jin et al., 2018), respectively. |
| Dataset Splits | No | The paper describes experiments in a 'synthetic environment' but does not specify any train/validation/test splits or their proportions, as data is generated through interaction in an RL setting. |
| Hardware Specification | No | The paper describes conducting 'numerical experiments' but does not specify any hardware details such as CPU, GPU models, or memory. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers (e.g., Python 3.x, PyTorch 1.x) that would allow for reproducible setup of software dependencies. |
| Experiment Setup | Yes | We set the number of states S to be 3, the number of actions A for each state to be 2, and the episode length H to be 5. The reward rh(s, a) for each state-action pair and each step is generated independently and uniformly at random from [0, 1]. We also generate the transition kernel Ph( | s, a) from an S-dimensional simplex independently and uniformly at random for each state-action pair and each step. Such procedure guarantees that the synthetic environment is a proper tabular MDP. Under the given MDP, we set M = 10 and T/H = 3 10^4 for Fed Q-Hoeffding, Fed Q-Beinstein, and T/H = 3 10^5, M = 1 for UCB-H and UCB-B. Thus, the total number of episodes is 3 10^5 for all four algorithms. We choose c = ι = 1 for all algorithms. |