Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

Authors: Chenyu Zhang, Han Wang, Aritra Mitra, James Anderson

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 1, we plot the mean squared error averaged over ten runs for different heterogeneity levels and numbers of agents. The simulation results are consistent with Corollary 2.1 and demonstrate the robustness of our method towards environmental heterogeneity.
Researcher Affiliation Academia Chenyu Zhang Data Science Institute Columbia University New York, NY 10025, USA cz2736@columbia.edu Han Wang Department of Electrical Engineering Columbia University New York, NY 10025, USA hw2786@columbia.edu Aritra Mitra Department of Electrical and Computer Engineering NC State University Raleigh, NC 27695, USA amitra2@ncsu.edu James Anderson Department of Electrical Engineering Columbia University New York, NY 10025, USA james.anderson@columbia.edu
Pseudocode Yes We present Fed SARSA in Algorithm 1. Algorithm 1: Fed SARSA
Open Source Code No The paper does not include a statement about releasing open-source code or provide a link to a code repository.
Open Datasets No To construct heterogeneous MDPs, we first generate a nominal MDP M1 and obtain the remaining MDPs by adding the perturbations to M1.
Dataset Splits No The paper conducts simulations in a generated environment, not on a traditional dataset with explicit train/validation/test splits described for reproduction.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes We create a finite state space of size |S| = 100, an action space of |A| = 100, a feature space of dimension d = 25, and set γ = 0.2 and R = 10. The actions determine the transition matrices by shifting the columns of a reference matrix. The synchronization period is set to K = 10, and the step-size of α0 = 0.01. For the full experiment setup, please refer to Appendix C.