Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FedEL: Federated Elastic Learning for Heterogeneous Devices

Authors: Letian Zhang, Bo Chen, Jieming Bian, Lei Wang, Jie Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiment results show that Fed EL achieves up to 3.87 improvement in time-to-accuracy compared to baselines while maintaining or exceeding final test accuracy. We implement Fed EL on both a hardware testbed and software simulations. We evaluate Fed EL using various DNN models and four real-world FL datasets across three key tasks: image classification, voice command recognition, and next-word prediction.
Researcher Affiliation	Academia	Letian Zhang Middle Tennessee State University Murfreesboro, TN 37132 EMAIL; Bo Chen Middle Tennessee State University Murfreesboro, TN 37132 EMAIL; Jieming Bian University of Florida Gainesville, FL 32611 EMAIL; Lei Wang University of Florida Gainesville, FL 32611 EMAIL; Jie Xu University of Florida Gainesville, FL 32611 EMAIL
Pseudocode	Yes	A The Algorithm of Fed EL In this paper, we introduce the sliding window training to address the first limitation and tensor importance adjustment to overcome the second limitation. We present a comprehensive window-based important tensor selection scheme implemented by Fed EL, as outlined in Algorithm 1. Specifically, prior to the FL process, each client performs offline tensor time profiling for the DNN model (Lines 3-5), which is done only once. In each online FL round, once the client receives the broadcasted global model, it evaluates the tensor importance for the current global model (Line 8), calculates the global tensor importance (Line 9), and adjusts the local tensor importance accordingly (Line 10). Based on the previous round s training status, Fed EL then slides or resets the window to ensure the entire DNN model is trained (Line 11). Once the window is fixed, Elastic Trainer is applied within the window to select important tensors, freeze unselected ones, and train only the selected tensors (Lines 12-13). Finally, the server aggregates the models from all clients and broadcasts the updated global model for the next FL round. Algorithm 1 Fed EL
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: We consider to provide open access to the data and code if the paper is accepted.
Open Datasets	Yes	Image Classification. VGG16 [35] model on CIFAR10 dataset [19] and Tiny Image Net dataset [23]. Speech Recognition. Res Net50 [11] model on Google command speech dataset [40]. Natural Language Processing. Lightweight Albert [22] model on Reddit dataset [32].
Dataset Splits	No	To follow the realistic non-iid data in FL scenarios, we partition the datasets into different clusters using a Dirichlet distribution with α equals 0.1. ... CIFAR-10 (10-client hardware deployment): We use full participation, where all 10 NVIDIA devices join every training round. ... Tiny-Image Net, Google Speech Commands, and Reddit (100-client simulation): We adopt partial participation, where 25 clients are randomly selected out of 100 in each round (i.e., 25% participation rate).
Hardware Specification	Yes	The hardware testbed consists of ten NVIDIA Jetson devices connected wirelessly to a server. ... comprising five NVIDIA Jetson Xavier NX kits (Xavier) [2] and five NVIDIA Jetson Orin kits (Orin) [1]... This simulation is conducted on a PC equipped with an NVIDIA 3090 GPU.
Software Dependencies	No	The paper mentions models (VGG16, ResNet50, Albert) and frameworks (Fed Avg, Elastic Trainer), but does not provide specific version numbers for any software libraries, programming languages, or operating systems used for the implementation.
Experiment Setup	Yes	Training Setup. The runtime threshold Tth is set based on the full model training time of the faster Orin devices, ensuring all clients complete local training within a similar timeframe. ... For fair comparisons with baseline methods, unless stated otherwise, the runtime threshold Tth is set to the full model training time of the fastest device, and the balance parameter β is fixed at 0.6.