CoBo: Collaborative Learning via Bilevel Optimization

Authors: Diba Hashemi, Lie He, Martin Jaggi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, COBO achieves superior performance, surpassing popular personalization algorithms by 9.3% in accuracy on a task with high heterogeneity, involving datasets distributed among 80 clients. We present three experiments to demonstrate the practical effectiveness of COBO.
Researcher Affiliation Collaboration Diba Hashemi EPFL diba.hashemi@epfl.ch; Tencent Inc. liam.he15@gmail.com; Martin Jaggi EPFL martin.jaggi@epfl.ch
Pseudocode Yes Algorithm 1 COBO: Collaborative Learning via Bilevel Optimization
Open Source Code Yes The code is available at: https://github.com/epfml/CoBo.
Open Datasets Yes using the CIFAR-100 dataset for multi-task learning [21]... subsets of the Wiki-40B dataset [12]
Dataset Splits No The paper describes how data is distributed among clients for collaborative learning tasks and mentions relying on 'validation performance' for some baselines, but it does not provide explicit training/validation/test dataset split percentages or counts.
Hardware Specification Yes For cross-silo experiments we employed a single NVIDIA V-100 GPU with 32GB memory, and moved to four NVIDIA V-100 GPUs with 32 GB memory for cross-device experiment. Training is performed on a single NVIDIA A-100 GPU with 40GB memory.
Software Dependencies No The paper mentions models and architectures like ResNet-9, GPT-2, and LoRA, but it does not provide specific software dependencies or library names with their version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes We use the fix batch size of 128 for cross-device, and cross-silo experiments on CIFAR-100. We tune each method for the optimal learning rate individually: we use learning rate of 0.1 for ditto, 0.05 for Federated Clustering (FC), and 0.01 for all other methods. For Language modeling experiment, we conducted the experiments with the learning rate of 0.002, batch size of 50, and 4 accumulation steps. We also used the context length of 512, dropout rate of 0.1, and Lo RA module with rank 4.