Federated Nearest Neighbor Classification with a Colony of Fruit-Flies

Authors: Parikshit Ram, Kaushik Sinha8036-8044

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that (i) Fly NN matches NNC accuracy across 70 Open ML datasets, (ii) Fly NNFL training is highly scalable with low communication overhead, providing up to 8 speedup with 16 parties.
Researcher Affiliation Collaboration Parikshit Ram1, Kaushik Sinha2 1IBM Research AI 2Wichita State University
Pseudocode Yes Algorithm 1: Fly NN training with training set S Rd [L], lifted dimensionality m N, s d nonzeros in each row of the lifting matrix M, ρ m nonzeros in the Fly Hash, decay rate γ [0, 1), random seed R, and inference with test point x Rd. Algorithm 2: Federated Differentially Private Fly NN training with τ parties Vt, t [τ] each with training set St with DP parameters ϵ and number of samples T. The Boolean IS DP toggles DP.
Open Source Code Yes The implementation details and compute resources used are described in Ram and Sinha (2021a, Appendix D.1) and relevant code is available at https://github.com/rithram/flynn.
Open Datasets Yes We consider 70 classification datasets from Open ML (Van Rijn et al. 2013) to evaluate the performance of Fly NN on real datasets, thoroughly comparing Fly NN to NNC. See Ram and Sinha (2021a, Appendix D.3) for details. We consider high dimensional vision datasets MNIST (Le Cun 1995), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017) and CIFAR (Krizhevsky, Hinton et al. 2009) from the Tensorflow package (Abadi et al. 2016) for evaluating the scaling of Fly NNFL training when the data is distributed between multiple parties.
Dataset Splits Yes We compute the normalized accuracy for a method on a dataset as (1 a/ak) where ak is the best tuned 10-fold cross-validated accuracy of k NNC on this dataset and a is the best tuned 10-fold cross-validated accuracy obtained by the method on this dataset.
Hardware Specification No The paper mentions 'compute resources used are described in Ram and Sinha (2021a, Appendix D.1)', but this information is not available in the provided text. No specific hardware details (GPU/CPU models, memory, etc.) are found in the paper.
Software Dependencies No The paper mentions 'TensorFlow package (Abadi et al. 2016)' but does not provide a specific version number. No other software dependencies are listed with version numbers.
Experiment Setup Yes For a dataset with d dimensions, we tune across 60 Fly NN hyper-parameter settings in the following ranges: m [2d, 2048d], s [2, 0.5d ], ρ [8, 256], and γ [0, 0.8]. We use this hyperparameter space for all experiments, except for the vision sets, where we use m [2d, 1024d].