reproducibilityindex.ai

Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

Authors: Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL. Through comprehensive experiments1, we demonstrate that VHL can drastically benefit the convergence speed and the generalization performance of FL models. 1The code is publicly available: https://github.com/ wizard1203/VHL
Researcher Affiliation	Academia	1Department of Computer Science, Hong Kong Baptist University 2Department of Computer Science and Engineering, The Hong Kong University of Science and Technology 3Data Science and Analytics Thrust, The Hong Kong University of Science and Technology (Guangzhou).
Pseudocode	Yes	Algorithm 1 summarizes the training procedure of applying VHL to Fed Avg, highlighting modifications to Fed Avg.
Open Source Code	Yes	The code is publicly available: https://github.com/ wizard1203/VHL
Open Datasets	Yes	To verify the effectiveness of VHL, we exploit a popular Fed ML framework (He et al., 2020) to conduct experiments over various datasets including CIFAR-10 (Krizhevsky & Hinton, 2009), FMNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), and CIFAR-100 (Krizhevsky & Hinton, 2009).
Dataset Splits	No	The paper describes partitioning methods for non-IID data (Latent Dirichlet Sampling, 2-classes partition, subset partition) and mentions using test accuracy for evaluation, but it does not explicitly detail the use or splits for a 'validation' dataset.
Hardware Specification	No	The paper does not specify any hardware details such as GPU/CPU models or memory used for running the experiments.
Software Dependencies	No	The paper mentions using a 'Fed ML framework' and specific models like 'ResNet18' and 'StyleGAN-v2', but it does not provide specific version numbers for any software dependencies like programming languages or libraries (e.g., Python version, PyTorch version, TensorFlow version).
Experiment Setup	Yes	The detailed hyper-parameters of each algorithm in each setting are reported in Appendix C. ... The learning rate configuration has been listed in Table 9. ... We use momentum-SGD as optimizers for all experiments, with momentum of 0.9, and weight decay of 0.0001. ... The batch size of the real data is set as 128, which also serves as the batch size of the virtual data. ... For different FL settings, when K = 10 or 100, and E = 1, the maximum communication round is 1000. For K = 10 and E = 5, the maximum communication round is 400 ... The number of clients selected for calculation is 5 per round for K = 10, and 10 for K = 100.