Federated Ensemble-Directed Offline Reinforcement Learning
Authors: Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, Srinivas Shakkottai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate FEDORA on a variety of Mu Jo Co environments and real-world datasets and show that it outperforms several other approaches, including performing offline RL on a pooled dataset. We also demonstrate FEDORA s excellent performance via real-world experiments on a Turtle Bot robot [1]. |
| Researcher Affiliation | Academia | Desik Rengarajan Nitin Ragothaman Dileep Kalathil Srinivas Shakkottai Department of Electrical and Computer Engineering, Texas A&M University |
| Pseudocode | Yes | Algorithm 1 Outline of Client i s Algorithm |
| Open Source Code | Yes | We provide our code and a video of our experiments at https://github.com/Desik Rengarajan/FEDORA. |
| Open Datasets | Yes | We consider the Hopper environment from Mu Jo Co [28], with |N| = 10, |Di| = 5000, and we use the data from the D4RL dataset [4]. |
| Dataset Splits | No | The paper describes training and testing procedures, and uses terms like 'validation' in a general sense for algorithm robustness, but does not explicitly provide details about a specific validation dataset split (e.g., percentages or sample counts) for its experiments. |
| Hardware Specification | Yes | Each run on the Mu Jo Co environments (as in Fig. 2) takes around 7 hours to complete when run on a single machine (AMD Ryzen Threadripper 3960X 24-Core Processor, 2x NVIDIA 2080Ti GPU). |
| Software Dependencies | No | The paper mentions software like 'Py Torch framework' and 'Flower federated learning platform [2]' but does not provide specific version numbers for these dependencies. For example, 'We use the Py Torch framework to program the algorithms in this work, based on a publicly-available TD3-BC implementation.' |
| Experiment Setup | Yes | We use a discount factor of 0.99, and the clients update their networks using the Adam optimizer with a learning rate of 3 10 4. For training FEDORA, we fixed the decay rate δ = 0.995 and the temperature β = 0.1. The batch size is 256 in both federated and centralized training. During a round of federation, each client performs 20 epochs of local training in all algorithms, which is roughly 380 local gradient steps in our experimental setup. |