Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentially Private High-dimensional Variable Selection via Integer Programming

Authors: Petros Prastakos, Kayhan Behdin, Rahul Mazumder

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical findings with extensive numerical experiments, using both least squares and hinge loss for our objective function, and demonstrate that our methods achieve state-of-the-art empirical support recovery, outperforming competing algorithms in settings with up to p = 104.
Researcher Affiliation Collaboration Petros Prastakos Operations Research Center MIT Cambridge, MA 02139, USA EMAIL Kayhan Behdin Linked In Sunnyvale, CA 94085, USA EMAIL
Pseudocode Yes Algorithm 1 Top-R method 1: procedure ˆ M(D, s, bx, by, r, R, T) ... Algorithm 2 Outer approximation for ˆSk(D) 1: procedure A(D, λ, r, s, a, b, tol)
Open Source Code Yes Code is available at https://github.com/petrosprastakos/DP-variable-selection.
Open Datasets No In our experiments, we draw the data points as yi = x T i β + ϵi for i [n], where x1, , xn iid N(0, Σ) Rp and the independent noise follows ϵ N(0, σ2In) where In is the identity matrix of size n. Moreover, for i, j [p], we set Σi,j = ρ|i j| and set nonzero coordinates of β to take value 1/ s at indices {1, 3, , 2s 1}.
Dataset Splits Yes In an effort to compare prediction accuracy across methods, we performed a 70/30 random train/test split and implemented Algorithm 2 in [23] on the training data, using half of the privacy budget (i.e., ϵ/2) for variable selection with the top-R, mistakes, Samp-Agg, or MCMC methods, and the remaining half for model optimization via objective perturbation (Algorithm 1 in [23]) to obtain the regression coefficients under the privacy budget (βpriv).
Hardware Specification Yes All experiments were conducted on a computing cluster using 20 cores and 64 GB RAM.
Software Dependencies No The Gurobi Optimizer is used under the Gurobi End User License Agreement. CVXPY is distributed under the Apache License, Version 2.0. ABESS package is distributed under GNU General Public License, Version 3.
Experiment Setup Yes In Algorithm 1, we set R = 2+(p s)s, bx = by = 0.5, r = 1.1 and T = for all our experiments in this paper. In Algorithm 2, we set a = 0.001, b = 0.005, r = 1.1 and tol = 0.005, and consider various values of the other parameters. The penalty parameter λ in Algorithm 2 was set to 600 and 170 for figures 1a and 1b, respectively, and the number of MCMC iterations was set to 100,000 for 1a.