Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sum Estimation under Personalized Local Differential Privacy

Authors: Dajun Sun, Wei Dong, Yuan Qiu, Ke Yi, Graham Cormode

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both synthetic and real data show that our protocols significantly outperform existing methods in terms of accuracy while providing a strong level of privacy.
Researcher Affiliation Academia Dajun Sun1 Wei Dong2 Yuan Qiu3 Ke Yi1 Graham Cormode4 1Hong Kong University of Science and Technology 2Nanyang Technological University 3Southeast University 4University of Warwick EMAIL EMAIL EMAIL EMAIL EMAIL
Pseudocode Yes Algorithm 1: Local Sum-R (Randomizer) Algorithm 2: Local Sum-A (Analyzer) Algorithm 3: Local Hist-r Algorithm 4: Local Hist-a
Open Source Code Yes The corresponding codes and data are provided in the Git Hub repository 2. https://github.com/personalizedldp/PLCDP
Open Datasets Yes The real-world data we used is the MNIST (train) dataset [12]
Dataset Splits No The real-world data we used is the MNIST (train) dataset [12], which consists of 60,000 images of handwritten digits, where each image is represented by a vector of dimension 28 28 = 784 and each coordinate is an integer ranging from 0 to 255. We perform sum estimation for each digit separately and treat each image as an individual s data. (The paper describes the usage of the MNIST (train) dataset for sum estimation but does not specify train/test/validation splits for model training, as the task is sum estimation over the dataset.)
Hardware Specification Yes All experiments are done on a desktop PC equipped with an M2 Pro CPU and 16GB memory.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers, such as programming languages or libraries, used for the experiments.
Experiment Setup Yes We set the probability parameter β = 0.1. Each experiment is repeated 20 times and we record the average running time and relative error compared to the true sum. We discard the top/lower 10% errors before computing the average error. The fraction of users in the conservative and liberal groups are set to 0.05 and 0.95, respectively. The privacy level for the users in the conservative and liberal groups are drawn uniformly at random from the ranges [ 1 n, 1] and [1, 100], respectively