Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Gaussian Mixing Mechanism: Renyi Differential Privacy via Gaussian Sketches

Authors: Omri Lev, Vishwak Srinivasan, Moshe Shenfeld, Katrina Ligett, Ayush Sekhari, Ashia C Wilson

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our methods improve performance across multiple datasets and, in several cases, reduce runtime. We validate these theoretical results through empirical evaluations, where our method consistently outperforms the baselines of Sheffet [2017] and Wang [2018] across several benchmark datasets. As Figure 2 shows, our method outperforms the Ada SSP method and the method of [Sheffet, 2017] , achieving lower or equal test MSE across all privacy levels on every dataset. As shown in Figure 3, our approach delivers lower runtime than both baselines and achieves consistent accuracy gains over objective perturbation.
Researcher Affiliation	Academia	1 Massachusetts Institute of Technology 2 The Hebrew University of Jerusalem 3 Boston University
Pseudocode	Yes	Algorithm 1 Modified Gauss Mix. Algorithm 2 Linear Mixing. Algorithm 3 Ada SSP. Algorithm 4 Sheffet s Algorithm (Original). Algorithm 5 Sheffet s Algorithm with Our Analysis. Algorithm 6 Objective Perturbation.
Open Source Code	Yes	Correspondence: EMAIL. Code: https://github.com/omrilev1/Gauss Mix.
Open Datasets	Yes	To demonstrate the usefulness of Algorithm 2, we have simulated its performance on four different datasets: the Communities & Crime dataset [Redmond and Baveja, 2002], the Tecator dataset [Thodberg, 2015], a synthetic dataset comprised of Gaussian features transformed via an MLP and another dataset comprised of Gaussian features. We conducted experiments on the Fashion-MNIST [Xiao et al., 2017] and the CIFAR100 [Krizhevsky and Hinton, 2009] datasets, using the implementations provided in torchvision.datasets.
Dataset Splits	Yes	The first two are real-world datasets: the Tecator dataset [Thodberg, 2015] and the Communities and Crime dataset [Red-mond and Baveja, 2002]. We have used a random train-test split of 80%/20% for generating a train and a test set. For both synthetic datasets, the train and test sets were generated independently, using the same fixed θ0 but with independent covariates and additive noise. We used the standard Py Torch train/test splits and normalized the training data by the maximum L2 norm across all training samples, ensuring that each training sample has a norm of at most 1. The same normalization factor was then applied to the test set.
Hardware Specification	Yes	All the linear regression experiments were run on 12th Gen Intel(R) Core(TM) i7-1255U, and all the logistic experiments were run on an NVIDIA A100 GPU.
Software Dependencies	No	Our non-private baseline is the standard Logistic Regression solver from the sklearn.linear_model library. The private baselines are the objective perturbation method (described in Appendix E.2), where the minimization is carried out using torch.optim.LBFGS with a maximum of 500 iterations and a tolerance of 10 6, following the setup of [Guo et al., 2020], and DP-SGD [Abadi et al., 2016] as implemented in Opacus Yousefpour et al. [2021] with a batch size of 1024, 10 epochs, and a learning rate of 0.5. We conducted experiments on the Fashion-MNIST [Xiao et al., 2017] and the CIFAR100 [Krizhevsky and Hinton, 2009] datasets, using the implementations provided in torchvision.datasets.
Experiment Setup	Yes	The baseline (non-private) estimator was computed as bθ = (X X+λId) 1X Y for λ = 10 6, ensuring invertibility in all cases. We report the mean squared error (MSE) for both the train and the test set, computed as the squared error in predicting yi via x i bθ. All results are averaged over 250 independent trials, and we report both the empirical means and confidence intervals. We further simulated a variant of algorithm. 1 from [Sheffet, 2017] that uses the analysis established in Lemma 1. We also fixed the parameter k on 4.5d. The network architecture used is a compact convolutional neural network for RGB image classification. It consists of two convolutional layers with Re LU activations and max pooling, reducing the input to a 64-channel feature map of size 8 8. The flattened features are passed through a fully connected layer with 128 hidden units and Re LU, followed by a final linear layer that outputs class logits. In both of the experiments, we have first trained this network end-to-end using the DP-SGD primitive implemented in Opacus [Yousefpour et al., 2021], where we have set the clipping parameter to 4.0, learning rate to 0.001, the number of epochs to 20, and the batch size to 500.