Neural Tangent Kernel Empowered Federated Learning

Authors: Kai Yue, Richeng Jin, Ryan Pilgrim, Chau-Wai Wong, Dror Baron, Huaiyu Dai

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results show that the proposed paradigm can achieve the same accuracy while reducing the number of communication rounds by an order of magnitude compared to federated averaging.
Researcher Affiliation Academia 1NC State University 2Independent Scholar.
Pseudocode No The paper describes the steps of its proposed paradigm in narrative form and refers to equations for mathematical details, but it does not present a structured pseudocode block or a clearly labeled algorithm figure.
Open Source Code Yes The implementation is available at https://github.com/KAI-YUE/ntk-fed.
Open Datasets Yes We use three datasets, namely, MNIST (Le Cun et al., 1998), Fashion-MNIST (Xiao et al., 2017), and FEMNIST (Caldas et al., 2018) digits. ... We evaluate different methods, including the centralized training simulation, data sharing method (Zhao et al., 2018), Fed Nova (Wang et al., 2020), Fed Avg (Mc Mahan et al., 2017), and the proposed NTK-FL on the non-IID CIFAR-10 dataset (Krizhevsky, 2009) and present the results in Figure 7.
Dataset Splits No The paper states: "Alternatively, if the server has an available validation dataset, the optimal number of update steps can be selected based on the model validation performance." However, it does not specify if or how a validation split was used for their own experiments or provide details on its size or composition.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as Python or PyTorch versions.
Experiment Setup Yes We empirically verify the convergence rate of the proposed method. For Fed Avg, we use the number of local iterations from {1, 3, . . . , 9, 10, 20, . . . , 50} and report the best results. For NTK-FL, we choose t(k) over the set {100, 200, . . . , 2000}. ... For the learning rate η, we search over the set {10−3, 3 × 10−3, 10−2, 3 × 10−2, 10−1}. The learning rate is fixed during the training. For the client batch size, we set it to 200 for all datasets. We consider a total of 300 clients and select 20 of them with equal probability in each round.