Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Survey Design for Private Mean Estimation
Authors: Yu-Wei Chen, Raghu Pasupathy, Jordan Awan
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We numerically illustrate our method through simulation studies. Section 5.1 compares compares variances between naive and DP-aware stratified sampling. Section 5.2 explores the interplay between the non-private and purely DP designs. Section 5.4 showcases the computational efficiency of our algorithm. The input of Algorithm 1, x , is obtained by package nloptr and alabama in R. All computations, including runtime measurements, were conducted on the Purdue Bell clusters using multiple cores. The source codes are available at https://github.com/garyUAchen/DP_Optim_Survey. |
| Researcher Affiliation | Academia | 1Department of Statistics, Purdue University, West Lafayette IN, USA. Correspondence to: Jordan Awan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Integer-Optimal Design Input: x (the optimal continuous solution) and Hessian matrix of g : Hg(x ) for i = 1, . . . , k 1 do Define Ti = {ni N : x i ni x i } end for Define T = {(n1, . . . , nk 1, nk) : nk = η Pk 1 i=1 ni, where (n1, . . . , nk 1) T1 . . . Tk 1} Select ninit. = arg minn T g(n) Calculate the smallest eigenvalue λ of Hg(x ) Calculate radius r = p 2(g(ninit.) g(x ))/λ for i = 1, . . . , k 1 do Define Si = {ni N : max(x i r, 1) ni min(x i + r, Ni, η k + 1)} end for Define S = {(n1, . . . , nk 1, nk) : nk = η Pk 1 i=1 ni, where (n1, . . . , nk 1) S1 . . . Sk 1} Select n = arg minn S g(n) by an exhaustive search. Output: n |
| Open Source Code | Yes | The source codes are available at https://github.com/garyUAchen/DP_Optim_Survey. |
| Open Datasets | No | The paper describes simulation scenarios with synthetic parameters for population sizes and variances, such as: "In this simulation, there are 4 groups with population sizes N = (7000, 8000, 9000, 10000) and variance σ2 = (0.08, 0.082, 0.083, 0.084) and a total sample size η = 200." There is no mention of external public datasets or access information for any dataset. |
| Dataset Splits | No | The paper describes simulation setups using synthetic parameters, not a pre-existing dataset that would require splitting into training, validation, or test sets. Therefore, no dataset split information is provided. |
| Hardware Specification | No | All computations, including runtime measurements, were conducted on the Purdue Bell clusters using multiple cores. While a specific cluster name is mentioned, details such as the CPU model, exact number of cores, or memory specifications are not provided, which are necessary for a specific hardware description. |
| Software Dependencies | No | The input of Algorithm 1, x , is obtained by package nloptr and alabama in R. This indicates the use of R and specific packages (nloptr and alabama), but no version numbers for R or the packages are provided. |
| Experiment Setup | Yes | In this simulation, there are 4 groups with population sizes N = (7000, 8000, 9000, 10000) and variance σ2 = (0.08, 0.082, 0.083, 0.084) and a total sample size η = 200. We plot the variance ratio from a naive subsampling scheme to that of the integer-optimal design while varying ϵ from 0.01 to 100. |