Recurrent Early Exits for Federated Learning with Heterogeneous Clients
Authors: Royson Lee, Javier Fernandez-Marques, Shell Xu Hu, Da Li, Stefanos Laskaridis, Łukasz Dudziak, Timothy Hospedales, Ferenc Huszár, Nicholas Donald Lane
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on standard image and speech classification benchmarks across various emerging federated fine-tuning baselines demonstrate Ree FL s effectiveness over previous works. |
| Researcher Affiliation | Collaboration | 1Samsung AI Center, Cambridge, UK 2University of Cambridge, Cambridge, UK 3Flower Labs, Cambridge, UK 4Brave Software, London, UK 5University of Edinburgh, Edinburgh, UK. |
| Pseudocode | No | The paper describes algorithms and methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/royson/reefl. |
| Open Datasets | Yes | CIFAR-100 (Krizhevsky et al., 2009). FEMNIST (Caldas et al., 2018a). Speech Command V2 (Warden, 2018). |
| Dataset Splits | Yes | We use the default partitions for train and test. Following prior works (Karimireddy et al., 2020; Wang et al., 2020a), we set the number of clients to 100 and partition the data using the latent Dirichlet allocation (LDA) method: y Dir(α) for each client. Hence, the lower the α, the greater the degree of data heterogeneity in label distributions. We use the LEAF benchmark (Caldas et al., 2018a) s natural partition, each client corresponds to its own handwriting, hence non-IID in both feature and label distributions. We use a total of 381 clients. We adopt the setup from Lee et al. (2023), sample 250 speakers from the training set, and split each speaker s data into 80%/20% train/test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a Dei T-S model and refers to PEFT methods like Lo RA, but it does not specify software dependencies with version numbers (e.g., PyTorch version, CUDA version). |
| Experiment Setup | Yes | We run each experiment 3 times for 1k rounds, sampling 10% of the total number of clients per round, and report the mean performance of each exit, as well as the mean and standard deviation (SD) of the mean performance of all exits. Each sampled client in a FL round trains its local parameters with its local dataset using SGD for a single epoch using batch size of 32. We ran a simple grid search to pick the highest performing learning rate (LR) [1e 1, 5e 2, 1e 2, 5e 3, 1e 3], weight decay [0, 1e 2, 1e 3, 1e 4], minimum LR after LR decay using a cosine annealing LR schedule [1e 2, 1e 3, 1e 4, 1e 5], for each baseline. More details, along with the hyperparameters of different baselines, aggregation, and PEFT methods, can be found in Appendix Section A. |