Resource-Aware Federated Self-Supervised Learning with Global Class Representations
Authors: Mingyi Li, Xiao Zhang, Qi Wang, Tengfei LIU, Ruofan Wu, Weiqiang Wang, Fuzhen Zhuang, Hui Xiong, Dongxiao Yu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on two datasets demonstrate the effectiveness of Fed MKD which outperforms state-of-the-art baselines 4.78% under linear evaluation on average. |
| Researcher Affiliation | Collaboration | Mingyi Li Shandong University Xiao Zhang Shandong University Qi Wang Shandong University Tengfei Liu Hong Kong University of Science and Technology Ruofan Wu Coupang Weiqiang Wang Shanghai Jiaotong University Fuzhen Zhuang Institute of Artificial Intelligence, Beihang University Zhongguancun Laboratory Hui Xiong Thrust of AI, HKUST(Guangzhou) Dep. of Com. Sci. and Eng., HKUST Dongxiao Yu Shandong University |
| Pseudocode | Yes | Algorithm 1: Algorithm of Fed MKD |
| Open Source Code | Yes | Code is available at https://github. com/limee-sdu/Fed MKD. |
| Open Datasets | Yes | We use CIFAR-10 and CIFAR-100 [13] datasets to train all the models. |
| Dataset Splits | Yes | Both of them contain 50,000 training images and 10,000 testing images. To construct the public dataset, we sample 4000 data samples from the training set, then divide the remaining data into N partitions to simulate N clients. |
| Hardware Specification | Yes | In this work, we use the NVIDIA Ge Force RTX 3090 cards with 24GB memory as the server and the clients. |
| Software Dependencies | No | We implement all the methods in Python using Easy FL[32] based on Py Torch. |
| Experiment Setup | Yes | The hyper-parameter γ in the loss of global model is set to 0.9. During the training process, each client trains locally for T = 5 epochs while the server also distills for T = 5 epochs. Finally, we set the target decay rate α = 0.99, with a batch size of B = 128, and utilize SGD for optimization with a learning rate of η = 0.032. |