Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Federated Ensemble-Directed Offline Reinforcement Learning

Authors: Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, Srinivas Shakkottai

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate FEDORA on a variety of Mu Jo Co environments and real-world datasets and show that it outperforms several other approaches, including performing offline RL on a pooled dataset. We also demonstrate FEDORA s excellent performance via real-world experiments on a Turtle Bot robot [1].
Researcher Affiliation Academia Desik Rengarajan Nitin Ragothaman Dileep Kalathil Srinivas Shakkottai Department of Electrical and Computer Engineering, Texas A&M University
Pseudocode Yes Algorithm 1 Outline of Client i s Algorithm
Open Source Code Yes We provide our code and a video of our experiments at https://github.com/Desik Rengarajan/FEDORA.
Open Datasets Yes We consider the Hopper environment from Mu Jo Co [28], with |N| = 10, |Di| = 5000, and we use the data from the D4RL dataset [4].
Dataset Splits No The paper describes training and testing procedures, and uses terms like 'validation' in a general sense for algorithm robustness, but does not explicitly provide details about a specific validation dataset split (e.g., percentages or sample counts) for its experiments.
Hardware Specification Yes Each run on the Mu Jo Co environments (as in Fig. 2) takes around 7 hours to complete when run on a single machine (AMD Ryzen Threadripper 3960X 24-Core Processor, 2x NVIDIA 2080Ti GPU).
Software Dependencies No The paper mentions software like 'Py Torch framework' and 'Flower federated learning platform [2]' but does not provide specific version numbers for these dependencies. For example, 'We use the Py Torch framework to program the algorithms in this work, based on a publicly-available TD3-BC implementation.'
Experiment Setup Yes We use a discount factor of 0.99, and the clients update their networks using the Adam optimizer with a learning rate of 3 10 4. For training FEDORA, we fixed the decay rate δ = 0.995 and the temperature β = 0.1. The batch size is 256 in both federated and centralized training. During a round of federation, each client performs 20 epochs of local training in all algorithms, which is roughly 380 local gradient steps in our experimental setup.