One-Pass Distribution Sketch for Measuring Data Heterogeneity in Federated Learning

Authors: Zichang Liu, Zhaozhuo Xu, Benjamin Coleman, Anshumali Shrivastava

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct empirical evaluations to answer three questions: (1) Does the one-pass sketch distance reflect the differences between distributions? (2) Does the sketch distance help convergence in FL, and (3) Does the sketch distance retrieve the best-personalized models? To answer these three questions, we conducted three sets of experiments.
Researcher Affiliation Collaboration Zichang Liu Rice University zichangliu@rice.edu Zhaozhuo Xu Stevens Institute of Technology zxu79@stevens.edu Benjamin Coleman Rice University benjamin.ray.coleman@gmail.com Anshumali Shrivastava Rice University & Third AI Corp. anshumali@rice.edu ... Now with Google Deep Mind.
Pseudocode Yes Algorithm 1 One-Pass Distribution Sketch
Open Source Code Yes Code is available at https://github.com/lzcemma/RACE_Distance
Open Datasets Yes Dataset: We evaluate Algorithm 3 and Algorithm 2 on both vision and language datasets. For visual classification, we use the MNIST dataset [51] and FEMNIST [52]. ... We also use the Shakespeare next-character prediction dataset [6] for language-based FL.
Dataset Splits No The paper mentions 'train' and 'test' sets but does not explicitly provide details about a 'validation' set or split.
Hardware Specification Yes Our FL codebase, including FL workflow, LSH functions, and proposed algorithms, is implemented on Py Torch [55]. We test Algorithm 3 and Algorithm 2 on a server with 8 Nvidia Tesla V100 GPU and a 48-core/96-thread processor (Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz).
Software Dependencies No Our FL codebase, including FL workflow, LSH functions, and proposed algorithms, is implemented on Py Torch [55].
Experiment Setup Yes For the MNIST dataset (both MNIST and MNIST Uniform + Direchlet), both Algorithm 3 and Fedavg are trained by 200 rounds. In each round, K = 3 clients are selected from L active clients. Next, each client is trained for 20 epochs with batch size 32 and learning rate η = 0.0001.