Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stochastic Gradients under Nuisances
Authors: Facheng Yu, Ronak Mehta, Alex Luedtke, Zaid Harchaoui
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our claims are theoretical convergence guarantees for various optimization algorithms. The results are included in Sec. 3 and the proofs are written in the appendix. G Numerical Experiments This section provides numerical experiments of the proposed stochastic methods in this paper. In Appx. G.1, we design a numerical experiment to illustrate our orthogonalization method. In Appx. G.2, we design simulations based on a partially linear model. In Appx. G.3, we conduct a real data analysis with synthetic outcome to evaluate the performance of our methods. Code for reproduction can be found at https://fachengyu.github.io/. |
| Researcher Affiliation | Academia | Facheng Yu Ronak Mehta Alex Luedtke Zaid Harchaoui University of Washington email: EMAIL. |
| Pseudocode | No | The paper describes methods and algorithms using mathematical formulations and prose (e.g., 'Specifically, we use SGD as the parameter estimator. We define θ(0,n) = θ(0) Θ and θ(i,0) = θ(i 1,n) for 1 i m, and produce the sequence θ(i,1), . . . , θ(i,n) using n steps of the SGD update (8) initialized at θ(i,0).'). No explicitly labeled 'Pseudocode' or 'Algorithm' block or code-like formatted procedure is provided. |
| Open Source Code | Yes | The code is provided in github. Code for reproduction can be found at https://fachengyu.github.io/. |
| Open Datasets | Yes | We consider the Diabetes 130-Hospitals Dataset [Clore et al., 2014] as the real dataset example. We use six of these features as covariates, which are summarized in Tab. 4. |
| Dataset Splits | No | Observe i.i.d. samples {Z i}m i=1 for the nuisance estimation and i.i.d. samples {Zi}n i=1 for the parameter estimation. To estimate nuisances using stream data, instead of fit a Ridge regression each time, we perform SGD for the Ridge regression loss. The procedure can be summarized as 1. Initialize RFF sampler with 20 components using n0 i.i.d. samples (Wi)n0 i=1 from PW |λ. 2. Perform SGD update once observing a mini-batch of i.i.d. samples from the joint distribution PX,W,Y |λ with size ng. |
| Hardware Specification | No | Our numerical illustration is not computationally prohibitive, and can run on an instance of Google Colab. |
| Software Dependencies | No | This success has been fueled by machine learning and AI software libraries such as JAX, Py Torch, Tensor Flow, and others, which offer a wide range of SGD variants, as long as a loss function can be clearly specified. |
| Experiment Setup | Yes | The learning rates of all the three SGDs are fixed during the training. Ridge regressions where the regularization parameter is set to be 0.01/m. For nuisance estimated using stream data, we allow the procedure repeated by plugging in updated nuisance estimators and an updated operator estimator, where the nuisances get updated for 2000 iterations after every 2000 target SGD iterations. For nuisance estimated using stream data, we update nuisances for 100 iterations after every 500 target SGD iterations. |