Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors

Authors: Hilal Asi, Vitaly Feldman, Hannah Keller, Guy Rothblum, Kunal Talwar

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we give empirical evidence demonstrating that PREAMBLE is accurate, private, and communication efficient. Blocking improves communication cost: Figure 4a shows the required communication, or DPF key size... Blocking reduces Truncation Error... Blocking is compatible with Privacy: We next evaluate the privacy-utility trade-off of our approach... In Fig. 5a, we plot the overhead in the per-batch noise standard deviation... Finally, we show some end-to-end experiments for private learning (Fig. 5b and Fig. 5c). We report results for using DP-SGD (with momentum) on MNIST [LCB10] and CIFAR-10 [Kri09]...
Researcher Affiliation	Collaboration	Hilal Asi Apple Vitaly Feldman Apple Hannah Keller Aarhus University Guy N. Rothblum Apple Kunal Talwar Apple
Pseudocode	Yes	Figure 3: Informal Description of our approach to Approximate Aggregation via k-block sparse vectors (a) k-wise 1-sparse sampling scheme (b) k-sparse sampling scheme Figure 6: Pseudocode for Subsampling algorithms. Figure 7: Gen generates DPF keys for a k-block-sparse vector of dimension d + log B with blocks of size B and security parameter λ, where G is a PRG that takes an input of size λ bits and outputs a bit string of length B log \|G\|. The values β of the non-zero entries in the vector correspond to elements of group G. Figure 8: Gen Next computes the seed and bit components of nodes at the next tree layer, where G is a PRG that each take an input of size λ bits and outputs a bit string of length 2(λ + 2). Figure 9: Eval evaluates one path x given a DPF key kb corresponding to server b for k-blocksparse vectors with block size B, where G and G are PRGs that each take an input of size λ bits and output a bit string of length 2(λ + 2) and B log \|G\|, respectively. g defines which correction word will be applied in the first layer. Also, let convert be a function that takes as input a bit string of length log \|G\| and outputs a group element in G.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Our main contribution is algorithmic, and the algorithm is described with very detailed pseudocode. The experiments in the paper are fully described. We evaluate on standard ML benchmarks.
Open Datasets	Yes	For MNIST, we use the model from [AFN+23] which has 69050 trainable parameters. For CIFAR, we train a simple two-layer neural network with 66954 parameters on CLIP [RKH+21] embeddings. [LCB10] Yann Le Cun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010. [Kri09] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009. [RKH+21] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8748 8763. PMLR, 18 24 Jul 2021. Model at https://github.com/openai/CLIP/.
Dataset Splits	No	Section J.1, MNIST experiment (Figure 5b): We run DP-SGD with fixed learning rate 0.1, momentum 0.9, and batch size 600 for 10 epochs. Section J.1, CIFAR10 experiment (Figure 5c): Then, we run DP-SGD with initial learning rate 4.0, momentum 0.9, weight decay 5 10 4 and full batch for 10 epochs. The paper specifies batch sizes (600, full batch) for training, but does not explicitly state how the original MNIST or CIFAR-10 datasets were split into training, validation, and test sets. It does not provide percentages, sample counts for splits, or reference to predefined splits with citations for dataset partitioning.
Hardware Specification	Yes	All of our experiments were run locally on a Macbook Pro equipped with Apple M1 Pro chip (with 10 cores), and 32GB RAM.
Software Dependencies	No	We calculate the standard deviation of the noise required for the Gaussian mechanism using the dp-accounting library in python.
Experiment Setup	Yes	MNIST experiment (Figure 5b). For our experiment that compares the standard deviation of the noise of the Gaussian mechanism and our approach, we consider a private training setup where we have a model of size D = 220 106 and number of data points n = 6 105. We user a standard setting of the parameters of DP-SGD where we have clipping norm 1.0, ε = 1.0, δ = 10 6, and number of epochs is 10. We calculate the standard deviation of the noise required for the Gaussian mechanism using the dp-accounting library in python. For our approach, we calculate the standard deviation based on our description in the paper using the Renyi DP accounting techniques of [FS25b] (for the remove direction) and [DCO25] (for the add direction) together with RDP bounds for Poisson subsampling in [ZW19] (we rely on RDP-based bounds since (ε, δ)-based bounds are worse when composition over numerous batches is required). For our method, we fix the communication cost k B = 32768 and then plot the ratio of the noise needed for our method over the noise of the Gaussian mechanism as a function of the block size B. We repeat this for different values of batch size in {512, 1028, 4096, 6 105} and report the results in Figure 5a. MNIST experiment (Figure 5b). For MNIST, we follow the experimental setup of [AFN+23] and train a neural network with 69050 parameters (see full description in Table 1). We run DP-SGD with fixed learning rate 0.1, momentum 0.9, and batch size 600 for 10 epochs. To privatize the gradients at each batch, we clip each individual gradient to have ℓ2 norm at most 1 and use the standard Gaussian mechanism or our partitioned subsampling approach to release private gradients. We calculate the standard deviation of the noise required for the Gaussian mechanism using the dp accounting library in python. We set our privacy parameters to be ε = 2.0 and δ = 10 6. For our partitioned subsampling approach, we use a block size B = 920 and number of blocks k = 20, and clip the ℓ2 norm of each block to be at most L = 1.02 p B/D where D = 69050 is the number of parameters in the model. We repeat this process 10 times, each time recoding the accuracy per epoch for each method, and plot the median accuracy with 90% confidence intervals in Figure 5b. CIFAR10 experiment (Figure 5c). For CIFAR, we produce CLIP embedding (using the version Vi T-B/32) for the CIFAR10 images and train a simple two-layer neural network with 66954 parameters: our network is a sequence of two fully connected layers: the first has dimensions 512 128 and the second 128 10. Then, we run DP-SGD with initial learning rate 4.0, momentum 0.9, weight decay 5 10 4 and full batch for 10 epochs. We use a step LR scheduler for the learning rate which reduces the learning rate by a factor of 0.9 every 5 epochs. Similarly to MNIST, we use the Gaussian mechanism and our partitioned subsampling approach to release private gradients, where we have clipping norm 1 for the gradients, ε = 2.0 and δ = 10 6. For our partitioned subsampling approach, we use a block size B = 920 and number of blocks k = 25, and clip the ℓ2 norm of each block to be at most L = 1.02 p B/D where D = 66954 is the number of parameters in the model. We repeat this process 10 times, each time recoding the accuracy per epoch for each method, and plot the median accuracy with 90% confidence intervals in Figure 5c.