Seizing Critical Learning Periods in Federated Learning

Authors: Gang Yan, Hao Wang, Jian Li8788-8796

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we seek critical learning periods in FL with systematic experiments and theoretical analysis, and we emphasize the necessity of seizing the critical learning periods to improve FL training efficiency. Specifically, through a range of carefully designed experiments on different ML models and datasets, we observe the consistent existence of critical learning periods in the FL training process.
Researcher Affiliation Academia 1 SUNY-Binghamton University 2 Louisiana State University gyan2@binghamton.edu, haowang@lsu.edu, lij@binghamton.edu
Pseudocode No The paper describes the steps of the Fed Avg algorithm in text and equations but does not provide a formal pseudocode block or algorithm box.
Open Source Code No The paper does not provide any statement about making its source code available or include a link to a code repository.
Open Datasets Yes We perform extensive simulations using two representative ML models: Res Net-18 (He et al. 2016) and CNN, on popular datasets CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009).
Dataset Splits No The paper details experimental setups involving 'partial local datasets' and 'recover rounds' for training, and mentions a 'test accuracy', but it does not specify a separate validation dataset or how data was formally split into training, validation, and test sets to reproduce the experiment.
Hardware Specification Yes The experiments run on Py Torch on Python 3 with NVIDIA RTX 3060 GPU.
Software Dependencies No The paper states 'The experiments run on Py Torch on Python 3'. However, it does not provide specific version numbers for PyTorch or Python, which are necessary for reproducible software dependencies.
Experiment Setup Yes We consider a system with N = 64 clients and Fed Avg randomly selects a subset of 12 clients in each round. The batch size is of 16; the initial learning rate is set to 0.01 with a decay of 0.97 per round; and the SGD solver is adopted using an exponential annealing scheduling for the learning rate with a weight decay of 5 × 10−4.