Recurrent Early Exits for Federated Learning with Heterogeneous Clients

Authors: Royson Lee, Javier Fernandez-Marques, Shell Xu Hu, Da Li, Stefanos Laskaridis, Łukasz Dudziak, Timothy Hospedales, Ferenc Huszár, Nicholas Donald Lane

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on standard image and speech classification benchmarks across various emerging federated fine-tuning baselines demonstrate Ree FL s effectiveness over previous works.
Researcher Affiliation Collaboration 1Samsung AI Center, Cambridge, UK 2University of Cambridge, Cambridge, UK 3Flower Labs, Cambridge, UK 4Brave Software, London, UK 5University of Edinburgh, Edinburgh, UK.
Pseudocode No The paper describes algorithms and methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/royson/reefl.
Open Datasets Yes CIFAR-100 (Krizhevsky et al., 2009). FEMNIST (Caldas et al., 2018a). Speech Command V2 (Warden, 2018).
Dataset Splits Yes We use the default partitions for train and test. Following prior works (Karimireddy et al., 2020; Wang et al., 2020a), we set the number of clients to 100 and partition the data using the latent Dirichlet allocation (LDA) method: y Dir(α) for each client. Hence, the lower the α, the greater the degree of data heterogeneity in label distributions. We use the LEAF benchmark (Caldas et al., 2018a) s natural partition, each client corresponds to its own handwriting, hence non-IID in both feature and label distributions. We use a total of 381 clients. We adopt the setup from Lee et al. (2023), sample 250 speakers from the training set, and split each speaker s data into 80%/20% train/test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running the experiments.
Software Dependencies No The paper mentions using a Dei T-S model and refers to PEFT methods like Lo RA, but it does not specify software dependencies with version numbers (e.g., PyTorch version, CUDA version).
Experiment Setup Yes We run each experiment 3 times for 1k rounds, sampling 10% of the total number of clients per round, and report the mean performance of each exit, as well as the mean and standard deviation (SD) of the mean performance of all exits. Each sampled client in a FL round trains its local parameters with its local dataset using SGD for a single epoch using batch size of 32. We ran a simple grid search to pick the highest performing learning rate (LR) [1e 1, 5e 2, 1e 2, 5e 3, 1e 3], weight decay [0, 1e 2, 1e 3, 1e 4], minimum LR after LR decay using a cosine annealing LR schedule [1e 2, 1e 3, 1e 4, 1e 5], for each baseline. More details, along with the hyperparameters of different baselines, aggregation, and PEFT methods, can be found in Appendix Section A.