Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee
Authors: Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Wei Jing, Cheston Tan, Bryan Kian Hsiang Low
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All theoretical results are empirically verified on various RL benchmark tasks. We also demonstrate its empirical efficacy on various RL benchmark tasks (Section 5). |
| Researcher Affiliation | Collaboration | 1Dept. of Computer Science, National University of Singapore, Republic of Singapore 2Dept. of ISEM, National University of Singapore, Republic of Singapore 3Institute for Infocomm Research, A*STAR, Republic of Singapore 4Alibaba DAMO Academy, Hangzhou, China |
| Pseudocode | Yes | Algorithm 1 Fed PG-BR; Algorithm 1.1 Fed PG-Aggregate |
| Open Source Code | Yes | The code and instructions to reproduce the results are given in our github repository: https://github.com/flint-xf-fan/Byzantine-Federeated-RL |
| Open Datasets | Yes | We evaluate the empirical performances of Fed PG-BR with and without Byzantine agents on different RL benchmarks, including Cart Pole balancing [55], Lunar Lander, and the 3D continuous locomotion control task of Half-Cheetah [56]. |
| Dataset Splits | No | The paper describes how policies are evaluated through interaction with the environment, but does not specify traditional training/validation/test *dataset* splits as would be common in supervised learning. RL often involves continuous interaction rather than static data splits for validation. |
| Hardware Specification | Yes | All experiments are conducted on an internal cluster of machines with Intel(R) Xeon(R) Gold 6130 CPUs (2.10GHz, 16 cores), 192GB RAM, and NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers for their implementation (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | if we choose ηt 1 2ΨB2/3 t , bt = 1, and Bt = B 4ΦL 2 where Φ Lg + C2 g Cw, Ψ (L(Lg + C2 g Cw))1/3...; For CartPole-v1, the episode horizon is set to 200, the server batch size Bt is set to 1000, and the server mini batch size bt is set to 10. The number of training iterations T is set to 5000. |