Preservation of the Global Knowledge by Not-True Distillation in Federated Learning
Authors: Gihun Lee, Minchan Jeong, Yongjin Shin, Sangmin Bae, Se-Young Yun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, Fed NTD shows state-of-the-art performance on various setups without compromising data privacy or incurring additional communication costs. We test our algorithm on MNIST [11], CIFAR-10 [25], CIFAR-100 [25], and CINIC-10 [10]. We compare our Fed NTD with various existing works, with results shown in Table 1. |
| Researcher Affiliation | Academia | Gihun Lee*, Minchan Jeong*, Yongjin Shin, Sangmin Bae, Se-Young Yun KAIST {opcrisis, mcjeong, yj.shin, bsmn0223, yunseyoung}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 Federated Not-True Distillation (Fed NTD) |
| Open Source Code | Yes | 1https://github.com/Lee-Gihun/Fed NTD |
| Open Datasets | Yes | We test our algorithm on MNIST [11], CIFAR-10 [25], CIFAR-100 [25], and CINIC-10 [10]. |
| Dataset Splits | No | The paper does not explicitly describe a separate validation dataset split with specific percentages or counts. It primarily refers to training and testing. |
| Hardware Specification | No | The provided text does not specify the hardware used for experiments, such as specific GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions 'Pytorch' in a reference but does not specify any software dependencies with version numbers (e.g., Python version, PyTorch version, specific libraries with versions) used for their experiments. |
| Experiment Setup | Yes | We use a momentum SGD with an initial learning rate of 0.1, and the momentum is set as 0.9. The learning rate is decayed with a factor of 0.99 at each round, and a weight decay of 1e-5 is applied. We adopt two different NIID partition strategies: (i) Sharding [37]: sort the data by label and divide the data into same-sized shards, and control the heterogeneity by s, the number of shards per user. (ii) Latent Dirichlet Allocation (LDA) [34, 46]: assigns partition of class c by sampling pc Dirpαq. |