Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimized Tradeoffs for Private Prediction with Majority Ensembling

Authors: Shuli Jiang, Qiuyi Zhang, Gauri Joshi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we demonstrate the strong empirical effectiveness of our first-of-its-kind privacy-constrained utility optimization for ensembling labels for private prediction from private teachers in image classification. Notably, our Da RRM framework with an optimized γ exhibits substantial utility gains when compared against several baselines. Experiments. In downstream tasks, such as semi-supervised knowledge transfer for private image classification, we compare our Da RRM with an optimized γ to compute the private label majority from private teachers against PATE Papernot et al. (2018), which computes the private label majority from non-private teachers.
Researcher Affiliation	Collaboration	Shuli Jiang EMAIL Robotics Institute, Carnegie Mellon University Qiuyi (Richard) Zhang EMAIL Google Deep Mind Gauri Joshi EMAIL Electrical and Computer Engineering, Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 Da RRM( ): Data-dependent Randomized Response Majority
Open Source Code	Yes	All code for the experiments can be found at https://anonymous.4open.science/r/Optimized Private Majority-CF50
Open Datasets	Yes	We use samples from two randomly chosen classes class 5 and 8 from the MNIST and Fashion-MNIST datasets to form our training and testing datasets.
Dataset Splits	Yes	Our MNIST has a total of 11272 training samples and 1866 testing samples; our Fashion-MNIST has 10000 training samples and 2000 testing samples. ... We train K = 11 teachers on equally divided subsets of the training datasets.
Hardware Specification	No	The paper mentions 'In practice, we observe with the Gurobi optimizer, one can optimize γ for K 41 on a laptop if δ > 0.' This refers to a practical limitation for optimization, not the specific hardware used for running the primary experiments or model training, and no specific model is mentioned.
Software Dependencies	No	The paper mentions 'Gurobi solver' and 'DP-SGD Abadi et al. (2016)' but does not provide specific version numbers for these software components or any other libraries or programming languages used.
Experiment Setup	Yes	Da RRM Setup: The Gaussian noise in DP-SGD has zero mean and std. σdpsgd = 12; the gradient norm clipping threshold is C = 1. ... We set the privacy allowance m = 35 ... We train K = 11 teachers ... for 5 epochs. ... We pick Q {20, 50, 100}.