Guiding The Last Layer in Federated Learning with Pre-Trained Models

Authors: Gwen Legate, Nicolas Bernier, Lucas Page-Caccia, Edouard Oyallon, Eugene Belilovsky

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical evidence that, for numerous downstream datasets, training only the classifier head proves to be an effective approach in FL settings.
Researcher Affiliation Academia Gwen Legate Concordia University, Mila Montreal, Canada gwendolyne.legate@mila.quebec Nicolas Bernier Concordia University, Mila Montreal, Canada Lucas Caccia Mc Gill University, Mila Montreal, Canada Edouard Oyallon Sorbonne University, ISIR, CNRS Paris, France Eugene Belilovsky Concordia University, Mila Montreal, Canada
Pseudocode Yes Algorithm 1 Fed NCM. K is the total number of clients, C is the number of classes in the training dataset, Dc is the total number of samples of class c Require: (X1, Y1), (X2, Y2), . . . , (XK, YK) Local datasets, wpt pre-trained model Server Executes: 1: for each client k K in parallel do 2: [mk c]c C Local Client Stats(Xk, Yk, wpt) Send to all clients, receive weighted class means 3: end for 4: for each class c C do 5: lc 1 Dc PK k=1 mk c lc can be used in NCM classifier 6: end for Client Side: 7: function LOCALCLIENTSTATS(X, Y, w) 8: for each class c N do 9: Let Xc = {xi X, yi = c} 10: mc P x Xc fw(x) 11: end for 12: return [mc]c C 13: end function
Open Source Code Yes Code for our experiments is available1. 1https://github.com/Gwen Legate/Guiding Last Layer FLPretrain
Open Datasets Yes We consider a setting similar to Nguyen et al. [2023] using the CIFAR 10 dataset [Krizhevsky, 2009] and expand our setting to include four additional standard computer vision datasets shown in Tab. 1.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits needed to reproduce the experiment with specific percentages, sample counts, or explicit mention of how the datasets were divided for validation purposes.
Hardware Specification Yes We use a combination of NVIDIA A100-SXM4-40GB, NVIDIA RTX A4500, Tesla V100-SXM2-32GB and Tesla P100-PCIE-12GB GPUs for a total of 1.1 GPU years .
Software Dependencies No The paper mentions software like the 'FLSim library' and 'Distill Bert model' but does not provide specific version numbers for these or other software dependencies like Python or deep learning frameworks.
Experiment Setup Yes We set the number of clients to 100, train for 1 local epoch per round, and set client participation to 30% for CIFAR (as in Nguyen et al. [2023]). For all other datasets we use full client participation for simplicity.