Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Private Hyperparameter Tuning with Ex-Post Guarantee
Authors: Badih Ghazi, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments We present two sets of experiments: In the first, we evaluate the performance of our algorithm on analytical tasks and in the second, we focus on the performance on a machine learning problem. ... The detailed results can be seen in Table 1. |
| Researcher Affiliation | Industry | Badih Ghazi Google Research EMAIL Pritish Kamath Google Research EMAIL Alexander Knop Google Research EMAIL Ravi Kumar Google Research EMAIL Pasin Manurangsi Google Research EMAIL Chiyuan Zhang Google Research EMAIL |
| Pseudocode | Yes | Algorithm 1 Hyperparameter Tuning Mechanism with Random Dropping. Parameters: Distribution E, Mechanisms Mi : D O and budget parameters εi for i [d] Input: Dataset D. S { } Sample k E for i = 1, . . . , d do Sample yi Ber(e εi k) {random drop} if yi = 1 then oi Mi(Di) S S {(o, i)} return maximum element in S {as per the total order on (O [d]) { }} |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The dataset used in the paper are standard and the paper has the code in the supplemental materials. |
| Open Datasets | Yes | Reddit: We use the webis/tldr-17 dataset [Völske et al., 2017]... train a linear regression model on a dataset of timeseries generated by Twitter usage [The AMA Team at Laboratoire d Informatique de Grenoble]... train a classifier for the MNIST dataset [Le Cun et al., 2010]... train a classifier for the Gisette [Guyon et al., 2004] dataset |
| Dataset Splits | No | The paper mentions using a test set in ML settings (Footnote 5, "If the test set is considered sensitive..."). The ML experiments describe training models on datasets (Twitter, MNIST, Gisette). While it mentions training, it doesn't explicitly state the exact splits (e.g., "80/10/10 split") for these datasets in the provided text. It mentions using "test set" and "training" but no explicit split percentages or counts. |
| Hardware Specification | Yes | Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: all the experiments are performed on a personal laptop within 10 minutes each. |
| Software Dependencies | No | In both cases we use Opacus [Yousefpour et al., 2021] for training DP-SGD... The example of MNIST written using Py Torch. The paper mentions software tools like Opacus and PyTorch but does not provide specific version numbers for these dependencies, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | (b) Algorithm 1 with the DP-SGD [Abadi et al., 2016] mechanism, learning linear models with ε = 0.01, possible values of ε in {0.1, 0.2, . . . 1}, learning rate in {0.01, 0.1, 1}, epochs in {1, 5, 10}, batch sizes in {32, 64, 128, 256, 512, 1000}, and clipping norms in {0.1, 1, 10}. |