Federated Ensemble-Directed Offline Reinforcement Learning

Authors: Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, Srinivas Shakkottai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate FEDORA on a variety of Mu Jo Co environments and real-world datasets and show that it outperforms several other approaches, including performing offline RL on a pooled dataset. We also demonstrate FEDORA s excellent performance via real-world experiments on a Turtle Bot robot [1].
Researcher Affiliation Academia Desik Rengarajan Nitin Ragothaman Dileep Kalathil Srinivas Shakkottai Department of Electrical and Computer Engineering, Texas A&M University
Pseudocode Yes Algorithm 1 Outline of Client i s Algorithm
Open Source Code Yes We provide our code and a video of our experiments at https://github.com/Desik Rengarajan/FEDORA.
Open Datasets Yes We consider the Hopper environment from Mu Jo Co [28], with |N| = 10, |Di| = 5000, and we use the data from the D4RL dataset [4].
Dataset Splits No The paper describes training and testing procedures, and uses terms like 'validation' in a general sense for algorithm robustness, but does not explicitly provide details about a specific validation dataset split (e.g., percentages or sample counts) for its experiments.
Hardware Specification Yes Each run on the Mu Jo Co environments (as in Fig. 2) takes around 7 hours to complete when run on a single machine (AMD Ryzen Threadripper 3960X 24-Core Processor, 2x NVIDIA 2080Ti GPU).
Software Dependencies No The paper mentions software like 'Py Torch framework' and 'Flower federated learning platform [2]' but does not provide specific version numbers for these dependencies. For example, 'We use the Py Torch framework to program the algorithms in this work, based on a publicly-available TD3-BC implementation.'
Experiment Setup Yes We use a discount factor of 0.99, and the clients update their networks using the Adam optimizer with a learning rate of 3 10 4. For training FEDORA, we fixed the decay rate δ = 0.995 and the temperature β = 0.1. The batch size is 256 in both federated and centralized training. During a round of federation, each client performs 20 epochs of local training in all algorithms, which is roughly 380 local gradient steps in our experimental setup.