Federated Principal Component Analysis

Authors: Andreas Grammenos, Rodrigo Mendoza Smith, Jon Crowcroft, Cecilia Mascolo

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical simulations show that, while using limitedmemory, our algorithm exhibits performance that closely matches or outperforms traditional non-federated algorithms, and in the absence of communication latency, it exhibits attractive horizontal scalability. All our experiments were computed on a workstation using an AMD 1950X CPU with 16 cores at 4.0GHz, 128 GB 3200 MHz DDR4 RAM, and Matlab R2020a (build 9.8.0.1380330). To foster reproducibility both code and datasets used for our numerical evaluation are made publicly available at: https://www.github.com/andylamp/federated_pca. To quantify the loss with the application of differential private that our scheme has we compare the quality of the projections using the MNIST standard test set [30] and Wine [10] datasets
Researcher Affiliation Collaboration Andreas Grammenos1,3 Rodrigo Mendoza-Smith2 Jon Crowcroft1,3 Cecilia Mascolo1 1Computer Lab, University of Cambridge 2Quine Technologies 3Alan Turing Institute
Pseudocode Yes Our procedure is presented in Alg. 1. Algorithm 1: Federated PCA (FPCA) ... Merge and FPCA-Edge are described in Algs. 2 and 3. Algorithm 2: Merger [46, 17] ... Algorithm 3: Federated PCA Edge (FPCA-Edge)
Open Source Code Yes To foster reproducibility both code and datasets used for our numerical evaluation are made publicly available at: https://www.github.com/andylamp/federated_pca.
Open Datasets Yes To quantify the loss with the application of differential private that our scheme has we compare the quality of the projections using the MNIST standard test set [30] and Wine [10] datasets which contain, respectively, 10000 labelled images of handwritten digits and physicochemical data for 6498 variants of red and white wine.
Dataset Splits No The paper mentions using the 'MNIST standard test set' and 'Wine datasets' for evaluation, but it does not specify explicit training/validation/test splits, percentages, or methodology for partitioning the data for training and validation beyond implying the use of standard test sets. It applies FPCA to 'the same datasets' but does not detail how the data was split for training or validation of the FPCA model itself.
Hardware Specification Yes All our experiments were computed on a workstation using an AMD 1950X CPU with 16 cores at 4.0GHz, 128 GB 3200 MHz DDR4 RAM
Software Dependencies Yes and Matlab R2020a (build 9.8.0.1380330)
Experiment Setup Yes Then, on the same datasets, we applied FPCA with rank estimate r = 6, block size b = 25, and DP budget (ε, δ) = (0.1, 0.1). To evaluate the utility loss with respect to the privacy-accuracy trade-off we fix δ = 0.01 and plot q A = v1, ˆv1 for ε {0.1k : k {1, . . . , 40}} where v1 and ˆv1 are defined as in Lemma 2. Synthetic data was generated from a power-law spectrum2 Yα Synth(α)d n Rd n using α {0.01, 0.1, .5, 1}.