Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Private Geometric Median in Nearly-Linear Time

Authors: Syamantak Kumar, Daogao Liu, Kevin Tian, Chutong Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical evidence supporting the efficacy of our techniques. We conduct experiments on Algorithm 1 (the radius estimation step of Section 3) and Algorithm 3, to evaluate how subsampled estimates and DP-SGD respectively improve the performance of our algorithm.
Researcher Affiliation Collaboration Syamantak Kumar University of Texas at Austin EMAIL Daogao Liu Google Research EMAIL Kevin Tian University of Texas at Austin EMAIL Chutong Yang University of Texas at Austin EMAIL
Pseudocode Yes Algorithm 1 Fast Radius(D, r, R, ϵ, δ) Algorithm 2 Fast Center(D, ˆr, ϵ, δ) Algorithm 3 Stable DPSGD(D, x, ˆr, ρ, δ, η, T)
Open Source Code Yes Answer: [Yes] Justification: As described above, we provide reproducible code in the supplemental material.
Open Datasets No We use two types of synthetic datasets with outliers, described in Appendix E: Gaussian Cluster (used in [HSU24] as well), and Heavy Tailed (a multivariate Student s t distribution).
Dataset Splits No The paper describes how to generate synthetic datasets and evaluates the algorithms on these generated datasets, focusing on error metrics rather than traditional training/validation/test splits. The problem (geometric median estimation) does not inherently require such splits for its evaluation, and no split percentages or methodologies are provided.
Hardware Specification Yes Our subsampling experiments were performed on a single Google Colab CPU, and our boosting experiments were performed on a personal Apple M4 with 16GB RAM.
Software Dependencies No The paper does not list specific software libraries or frameworks with version numbers (e.g., Python, PyTorch, CUDA versions) used for implementation.
Experiment Setup Yes To satisfy ρ-CDP, Algorithm 3 in [HSU24] recommends a constant step size of ηbase = 2ˆr qd 6ρn2 , where ˆr is the estimated radius. We examined performance using various step sizes η = ηbase ηmultiplier with multipliers ηmultiplier {0.5, 1, 10, 30, 50, 100}. ... Based on these experiments, we select ηmultiplier = 30 for DPGD in our experiments... In all our experiments, we set d = 50, ρ = 0.5, and vary n {100, 1000, 10000}. For the Gaussian Cluster dataset, we set σ = 0.1 and vary the bounding radius R {25, 50, 100}. We set our estimated initial radius ˆr = 20σ d and initialize all algorithms at a uniformly random point on the surface of Bd(0.75ˆr).