Mean Estimation with User-level Privacy under Data Heterogeneity

Authors: Rachel Cummings, Vitaly Feldman, Audra McMillan, Kunal Talwar

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work we propose a simple model of heterogeneous user data that differs in both distribution and quantity of data, and we provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in our problem.
Researcher Affiliation Collaboration Rachel Cummings Department of Industrial Engineering and Operations Research Columbia University New York, NY 10027 rac2239@columbia.edu Vitaly Feldman Apple Cupertino, CA 95014 Audra Mc Millan Apple Cupertino, CA 95014 audra.mcmillan@apple.com Kunal Talwar Apple Cupertino, CA 95014 ktalwar@apple.com
Pseudocode Yes Algorithm 1 Non-private Heterogeneous Mean Estimation. Algorithm 2 Private Heterogeneous Mean Estimation.
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository.
Open Datasets No The paper is theoretical and does not describe experiments performed on a specific, publicly available dataset. It discusses data in a general, abstract sense (e.g., 'user data is heterogeneous', 'each user generates multiple data points').
Dataset Splits No The paper is theoretical and does not describe experiments that would involve dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any computational experiments or their hardware specifications.
Software Dependencies No The paper is theoretical and does not describe any computational experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe any computational experiments or their setup details, such as hyperparameters or training configurations.