reproducibilityindex.ai

Private and Personalized Frequency Estimation in a Federated Setting

Authors: Amrith Setlur, Vitaly Feldman, Kunal Talwar

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide an extensive empirical evaluation of our private and non-private algorithms under varying levels of statistical and size heterogeneity on the Reddit, Stack Overflow, and Amazon Reviews datasets. Our results demonstrate significant improvements over standard and clustering-based baselines.
Researcher Affiliation	Collaboration	Amrith Setlur Carnegie Mellon University asetlur@cs.cmu.eduVitaly Feldman Apple vitaly.edu@gmail.comKunal Talwar Apple ktalwar@apple.com
Pseudocode	Yes	Algorithm 1 User Output, Algorithm 2 Non Private Center, Algorithm 3 Histogram Cluster, Algorithm 4 Private Center, Algorithm 5 Private Init, Algorithm 6 End To End Algorithm.
Open Source Code	No	code us not yet ready for anonymous open sourcing. We plan to open-source the code with appropriate licensing for the updated version of the paper.
Open Datasets	Yes	We evaluate methods on three real-world datasets: Reddit [16], Stack Overflow [6], and Amazon Reviews [62].
Dataset Splits	Yes	For all datasets, we partition the data for each client into 60 : 40 train/test splits. Additionally, we set aside 5% of users in each, as a validation, to tune cluster count K, privacy clip bound c, etc.
Hardware Specification	Yes	None of our experiments require very high computational requirements and can be run with one 3090Ti card.
Software Dependencies	No	We use the NLTK tokenizer [11] with a vocabulary of size 10k tokens, and for the other two datasets, we use the Huggingface (bert-case-uncased) tokenizer [74] with a vocabulary size of 32k tokens.
Experiment Setup	Yes	We validate hyperparameters for our algorithms and baselines using a hold out validation set of users. For clustering we use K = 10... We run clustering for T = 50 iterations non-privately and T = 20 iterations privately... We tune the finetuning parameter λ by sweeping across {0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5} and find λ = 0.25, λ = 0.15 and λ = 0.1 to be optimal... We train MAML and IFCA using SGD with momentum, with learning rate 0.01 and momentum parameter 0.9. For RTFA, we find the proximal regularization parameter of 0.2 to be optimal... For private training, we use a clipping threshold of c = 0.1 in Alg. 4, and use c = 4.0 for Alg. 4.