Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee

Authors: Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Wei Jing, Cheston Tan, Bryan Kian Hsiang Low

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All theoretical results are empirically verified on various RL benchmark tasks. We also demonstrate its empirical efficacy on various RL benchmark tasks (Section 5).
Researcher Affiliation Collaboration 1Dept. of Computer Science, National University of Singapore, Republic of Singapore 2Dept. of ISEM, National University of Singapore, Republic of Singapore 3Institute for Infocomm Research, A*STAR, Republic of Singapore 4Alibaba DAMO Academy, Hangzhou, China
Pseudocode Yes Algorithm 1 Fed PG-BR; Algorithm 1.1 Fed PG-Aggregate
Open Source Code Yes The code and instructions to reproduce the results are given in our github repository: https://github.com/flint-xf-fan/Byzantine-Federeated-RL
Open Datasets Yes We evaluate the empirical performances of Fed PG-BR with and without Byzantine agents on different RL benchmarks, including Cart Pole balancing [55], Lunar Lander, and the 3D continuous locomotion control task of Half-Cheetah [56].
Dataset Splits No The paper describes how policies are evaluated through interaction with the environment, but does not specify traditional training/validation/test *dataset* splits as would be common in supervised learning. RL often involves continuous interaction rather than static data splits for validation.
Hardware Specification Yes All experiments are conducted on an internal cluster of machines with Intel(R) Xeon(R) Gold 6130 CPUs (2.10GHz, 16 cores), 192GB RAM, and NVIDIA V100 GPUs.
Software Dependencies No The paper does not provide specific software dependencies with version numbers for their implementation (e.g., Python, PyTorch versions).
Experiment Setup Yes if we choose ηt 1 2ΨB2/3 t , bt = 1, and Bt = B 4ΦL 2 where Φ Lg + C2 g Cw, Ψ (L(Lg + C2 g Cw))1/3...; For CartPole-v1, the episode horizon is set to 200, the server batch size Bt is set to 1000, and the server mini batch size bt is set to 10. The number of training iterations T is set to 5000.