TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels
Authors: Yaodong Yu, Alexander Wei, Sai Praneeth Karimireddy, Yi Ma, Michael Jordan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess the performance of federated learning algorithms on the image classification tasks FMNIST [80], CIFAR10, and CIFAR100 [41]. Table 2 displays the top-1 accuracy of all algorithm on the three tasks with varying degrees of data heterogeneity. |
| Researcher Affiliation | Academia | Yaodong Yu UC Berkeley yyu@eecs.berkeley.edu Alexander Wei UC Berkeley awei@berkeley.edu Sai Praneeth Karimireddy UC Berkeley sp.karimireddy@berkeley.edu Yi Ma UC Berkeley yima@eecs.berkeley.edu Michael I. Jordan UC Berkeley jordan@cs.berkeley.edu |
| Pseudocode | Yes | The detailed description of SCAFFOLD for solving linear regression problems can be found in Algorithm 1, Appendix A. |
| Open Source Code | Yes | Our code is available at https://github.com/yaodongyu/TCT. |
| Open Datasets | Yes | We assess the performance of federated learning algorithms on the image classification tasks FMNIST [80], CIFAR10, and CIFAR100 [41]. |
| Dataset Splits | No | The paper mentions "test accuracy" and "training images" for datasets like FMNIST and CIFAR10/100. It also refers to tuning learning rates, which implies a validation step, but it does not provide specific details on the validation dataset splits (e.g., percentages or sample counts) needed for reproduction. For instance, "There are 60,000 training images in FMNIST, and 50,000 training images in CIFAR10/100." and "We report the test accuracy of a Res Net-18 after (centralized) retraining of the last layers on CIFAR10." |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., "Python 3.8, PyTorch 1.9, and CUDA 11.1"). |
| Experiment Setup | Yes | Each client uses SGD with weight decay 10 5 and batch size 64 by default. For each baseline method, we run it for 200 total communication rounds using 5 local training epochs with local learning rate selected from {0.1, 0.01, 0.001} by grid search. For TCT, we run 100 rounds of Fed Avg in Stage 1 following the above and use 100 communication rounds in Stage 2 with M = 500 local steps and local learning rate = 5 10 5. |