Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentially Private High-dimensional Variable Selection via Integer Programming
Authors: Petros Prastakos, Kayhan Behdin, Rahul Mazumder
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical findings with extensive numerical experiments, using both least squares and hinge loss for our objective function, and demonstrate that our methods achieve state-of-the-art empirical support recovery, outperforming competing algorithms in settings with up to p = 104. |
| Researcher Affiliation | Collaboration | Petros Prastakos Operations Research Center MIT Cambridge, MA 02139, USA EMAIL Kayhan Behdin Linked In Sunnyvale, CA 94085, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Top-R method 1: procedure ˆ M(D, s, bx, by, r, R, T) ... Algorithm 2 Outer approximation for ˆSk(D) 1: procedure A(D, λ, r, s, a, b, tol) |
| Open Source Code | Yes | Code is available at https://github.com/petrosprastakos/DP-variable-selection. |
| Open Datasets | No | In our experiments, we draw the data points as yi = x T i β + ϵi for i [n], where x1, , xn iid N(0, Σ) Rp and the independent noise follows ϵ N(0, σ2In) where In is the identity matrix of size n. Moreover, for i, j [p], we set Σi,j = ρ|i j| and set nonzero coordinates of β to take value 1/ s at indices {1, 3, , 2s 1}. |
| Dataset Splits | Yes | In an effort to compare prediction accuracy across methods, we performed a 70/30 random train/test split and implemented Algorithm 2 in [23] on the training data, using half of the privacy budget (i.e., ϵ/2) for variable selection with the top-R, mistakes, Samp-Agg, or MCMC methods, and the remaining half for model optimization via objective perturbation (Algorithm 1 in [23]) to obtain the regression coefficients under the privacy budget (βpriv). |
| Hardware Specification | Yes | All experiments were conducted on a computing cluster using 20 cores and 64 GB RAM. |
| Software Dependencies | No | The Gurobi Optimizer is used under the Gurobi End User License Agreement. CVXPY is distributed under the Apache License, Version 2.0. ABESS package is distributed under GNU General Public License, Version 3. |
| Experiment Setup | Yes | In Algorithm 1, we set R = 2+(p s)s, bx = by = 0.5, r = 1.1 and T = for all our experiments in this paper. In Algorithm 2, we set a = 0.001, b = 0.005, r = 1.1 and tol = 0.005, and consider various values of the other parameters. The penalty parameter λ in Algorithm 2 was set to 600 and 170 for figures 1a and 1b, respectively, and the number of MCMC iterations was set to 100,000 for 1a. |