reproducibilityindex.ai

On Privacy and Personalization in Cross-Silo Federated Learning

Authors: Ken Liu, Shengyuan Hu, Steven Z. Wu, Virginia Smith

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an empirical study of competing methods as well as a theoretical characterization of MR-MTL for mean estimation, highlighting the interplay between privacy and cross-silo data heterogeneity. Our work serves to establish baselines for private cross-silo FL as well as identify key directions of future work in this area. and Fig. 3 shows the privacy-utility tradeoffs across four datasets. We observe that MR-MTL consistently outperforms a suite of baseline methods, and that it performs at least as good as local training and Fed Avg (endpoints of the personalization spectrum), except at high-privacy regimes (possibly different for each dataset).
Researcher Affiliation	Academia	Carnegie Mellon University {kzliu, shengyuanhu, zstevenwu, smithv}@cmu.edu
Pseudocode	Yes	Of particular interest is the family of mean-regularized multi-task learning (MR-MTL) methods [30, 94, 38, 37] (see Algorithm A1 for a typical instantiation).
Open Source Code	Yes	Code is available at https://github.com/kenziyuliu/private-cross-silo-fl.
Open Datasets	Yes	We consider four cross-silo datasets that span regression/classiﬁcation and convex/nonconvex tasks: Vehicle [26], School [35], Google Glass (GLEAM) [82], and CIFAR-10 [53]. See Appendix C.1 for more details and datasets.
Dataset Splits	Yes	CIFAR-10 has heterogeneous client splits following [94, 90]. See Appendix C.1 for more details and datasets. For Vehicle, School, and GLEAM, we adopt the existing dataset splits as described in [57, 59] and [16] respectively, and for CIFAR-10, we generated heterogeneous client splits as described in [94, 90].
Hardware Specification	No	No. The paper states 'All experiments are implemented in JAX [14] with Haiku [42] and run on GPU machines. The total computational budget for the main experiments is about 500 GPU hours.', but it does not specify the type or model of the GPUs used.
Software Dependencies	No	No. The paper mentions software such as 'JAX [14] with Haiku [42]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For all methods, we use minibatch DP-SGD in each silo to satisfy silo-speciﬁc sample-level privacy; while certain methods may have more efﬁcient solvers (e.g. dual form for [91]), we want compatibility with DP-SGD as well as privacy ampliﬁcation via sampling on the example level for tight accounting. For all experiments, silos train for 1 local epoch in every round (except for [57] which runs 2 epochs). Hyperparameter tuning is done via grid search for all methods. Learning rate is fixed to 0.005 for all datasets and models, with cosine annealing to 0.