Federated Submodel Optimization for Hot and Cold Data Features

Authors: Yucheng Ding, Chaoyue Niu, Fan Wu, Shaojie Tang, Chengfei Lyu, yanghe feng, Guihai Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally evaluate Fed Sub Avg over several public and industrial datasets. The evaluation results demonstrate that Fed Sub Avg significantly outperforms Fed Avg and its variants.
Researcher Affiliation Collaboration Yucheng Ding1 Chaoyue Niu1 Fan Wu1 Shaojie Tang2 Chengfei Lv3 Yanghe Feng4 Guihai Chen1 1Shanghai Jiao Tong University 2University of Texas at Dallas 3Alibaba Group 4National University of Defense Technology
Pseudocode Yes Algorithm 1 Federated Submodel Averaging (Fed Sub Avg)
Open Source Code Yes The code is available on https://github.com/sjtu-yc/federated-submodel-averaging.
Open Datasets Yes Using the public Movie Lens, Sentiment140, and Amazon datasets, as well as an industrial dataset from Alibaba, we extensively evaluate Fed Sub Avg2 and compare it with Fed Avg, Fed Prox, Scaffold, and Fed Adam.
Dataset Splits No We randomly select 20% of the samples as the test dataset and leave the remaining 80% as the training dataset for FL. The paper does not explicitly provide details on a validation dataset split.
Hardware Specification Yes All experiments were run on a server with 8 NVIDIA 2080Ti GPUs.
Software Dependencies No The paper mentions 'mini-batch SGD' and 'Adam optimizer' but does not specify software dependencies like programming languages, libraries, or frameworks with version numbers.
Experiment Setup Yes For the tasks of rating classification and sentiment analysis, K = 50 clients are randomly chosen per round as default; and for the CTR prediction tasks, K is set to 100 as default. ... For all the datasets, the batch size for each client is set to 16. The number of local epochs is set to 1 for all the algorithms. We use the Adam optimizer for all the local training process with = 0.9 and = 0.999. The learning rate for each task is tuned using grid search over {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1}.