reproducibilityindex.ai

Divergence-aware Federated Self-Supervised Learning

Authors: Weiming Zhuang, Yonggang Wen, Shuai Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We aim to fill in this gap via in-depth empirical study and propose a new method to tackle the non-independently and identically distributed (non-IID) data problem of decentralized data. Firstly, we introduce a generalized Fed SSL framework that embraces existing SSL methods based on Siamese networks and presents flexibility catering to future methods. In this framework, a server coordinates multiple clients to conduct SSL training and periodically updates local models of clients with the aggregated global model. Using the framework, our study uncovers unique insights of Fed SSL: 1) stop-gradient operation, previously reported to be essential, is not always necessary in Fed SSL; 2) retaining local knowledge of clients in Fed SSL is particularly beneficial for non-IID data. Extensive experiments demonstrate that Fed EMA outperforms existing methods by 3-4% on linear evaluation.
Researcher Affiliation	Collaboration	Weiming Zhuang1,3, Yonggang Wen2, Shuai Zhang3 1S-Lab, NTU, Singapore 2NTU, Singapore 3Sense Time Research
Pseudocode	Yes	Algorithm 1 Our proposed Fed EMA
Open Source Code	No	Moreover, we plan to open-source the codes in the future.
Open Datasets	Yes	Datasets We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009).
Dataset Splits	Yes	Datasets We conduct experiments using CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). Both datasets consist of 50,000 training images and 10,000 testing images. CIFAR-10 contains 10 classes, where each class has 5,000 training images and 1,000 testing images. While CIFAR-100 contains 100 classes, where each class has 500 training images and 100 testing images. To simulate federated learning, we equally split the training set into K clients. We first obtain a trained encoder (or learned representations) using full training set for linear evaluation and 99% or 90% of the training set for semi-supervised learning (excluding the 1% or 10% for fine-tuning).
Hardware Specification	Yes	To simulate federated learning, we train each client on one NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions using 'Python' and 'Py Torch (Paszke et al., 2017)' and 'Easy FL (Zhuang et al., 2022)'. However, it does not provide specific version numbers for Python, PyTorch, or Easy FL, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	By default, we train for R = 100 rounds with K = 5 clients, E = 5 local epoches, batch size B = 128, learning rate η = 0.032 with cosine decay, and non-IID data l = 2 (l = 20) for CIFAR-10 (CIFAR-100). We use Stochastic Gradient Descent (SGD) as the optimizer in training. We use η = 0.032 as the initial learning rate and decay the learning with a cosine annealing (Loshchilov & Hutter, 2017), which is also used in Sim Siam.