Divergence-aware Federated Self-Supervised Learning
Authors: Weiming Zhuang, Yonggang Wen, Shuai Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We aim to fill in this gap via in-depth empirical study and propose a new method to tackle the non-independently and identically distributed (non-IID) data problem of decentralized data. Firstly, we introduce a generalized Fed SSL framework that embraces existing SSL methods based on Siamese networks and presents flexibility catering to future methods. In this framework, a server coordinates multiple clients to conduct SSL training and periodically updates local models of clients with the aggregated global model. Using the framework, our study uncovers unique insights of Fed SSL: 1) stop-gradient operation, previously reported to be essential, is not always necessary in Fed SSL; 2) retaining local knowledge of clients in Fed SSL is particularly beneficial for non-IID data. Extensive experiments demonstrate that Fed EMA outperforms existing methods by 3-4% on linear evaluation. |
| Researcher Affiliation | Collaboration | Weiming Zhuang1,3, Yonggang Wen2, Shuai Zhang3 1S-Lab, NTU, Singapore 2NTU, Singapore 3Sense Time Research |
| Pseudocode | Yes | Algorithm 1 Our proposed Fed EMA |
| Open Source Code | No | Moreover, we plan to open-source the codes in the future. |
| Open Datasets | Yes | Datasets We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | Datasets We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). Both datasets consist of 50,000 training images and 10,000 testing images. CIFAR-10 contains 10 classes, where each class has 5,000 training images and 1,000 testing images. While CIFAR-100 contains 100 classes, where each class has 500 training images and 100 testing images. To simulate federated learning, we equally split the training set into K clients. We first obtain a trained encoder (or learned representations) using full training set for linear evaluation and 99% or 90% of the training set for semi-supervised learning (excluding the 1% or 10% for fine-tuning). |
| Hardware Specification | Yes | To simulate federated learning, we train each client on one NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions using 'Python' and 'Py Torch (Paszke et al., 2017)' and 'Easy FL (Zhuang et al., 2022)'. However, it does not provide specific version numbers for Python, PyTorch, or Easy FL, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | By default, we train for R = 100 rounds with K = 5 clients, E = 5 local epoches, batch size B = 128, learning rate η = 0.032 with cosine decay, and non-IID data l = 2 (l = 20) for CIFAR-10 (CIFAR-100). We use Stochastic Gradient Descent (SGD) as the optimizer in training. We use η = 0.032 as the initial learning rate and decay the learning with a cosine annealing (Loshchilov & Hutter, 2017), which is also used in Sim Siam. |