Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Authors: Michal Derezinski

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We next demonstrate numerically that SVRN can be eﬀectively used to accelerate stochastic Newton methods in practice. We also show how variance reduction can be incorporated into a globally convergent Subsampled Newton method in a way that is robust to hyperparameters and preserves its scalability thanks to large-batch operations.1 5.1 Logistic regression experiment In this section, we present numerical experiments for solving a regularized logistic loss minimization task. For an n d data matrix A with rows a i , an n-dimensional target vector y (with 1 entries yi) and a regularization parameter γ, our task is to minimize: i=1 log(1 + e yia i x) + γ As a dataset, we used the Extended MNIST dataset of handwritten digits (EMNIST, Cohen et al., 2017) with n = 500k datapoints, transformed using a random features map (with dimension d = 1000). Experimental details, as well as further results on the CIFAR-10 dataset and several synthetic data matrices, are presented in Appendix B. In Figure 2, we compared SVRN-HA to three baselines which are most directly comparable: (1) the classical Newton s method; (2) SVRG with the step size and number of inner iterations tuned for best wall-clock time; and (3) Subsampled Newton with Hessian Averaging (SN-HA), i.e., the method we use in the global phase of Algorithm 1 (without the SVRN phase). All of the convergence plots are averaged over 10 runs.
Researcher Affiliation	Academia	Michał Dereziński EMAIL Department of Electrical Engineering and Computer Science University of Michigan
Pseudocode	Yes	Algorithm 1: SVRN with Hessian Averaging (SVRN-HA)
Open Source Code	Yes	1The code is available at https://github.com/svrnewton/svrn.
Open Datasets	Yes	As a dataset, we used the Extended MNIST dataset of handwritten digits (EMNIST, Cohen et al., 2017) with n = 500k datapoints... Experimental details, as well as further results on the CIFAR-10 dataset and several synthetic data matrices, are presented in Appendix B.
Dataset Splits	No	The paper mentions using EMNIST and CIFAR-10 datasets and transforming them, but does not explicitly provide the training/test/validation splits used for the experiments. It states "partitioned the classes into two labels 1 and -1" which is a data preprocessing step, not a dataset split.
Hardware Specification	No	The paper discusses parallel complexity and the benefits of large mini-batches for 'most computing architectures' and 'single- and multi-core architectures' but does not specify any particular hardware (e.g., GPU, CPU models, or cloud instances) used for running the experiments described in Section 5.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the implementation of their methods or experiments.
Experiment Setup	Yes	In this section, we present numerical experiments for solving a regularized logistic loss minimization task... regularization parameter γ, our task is to minimize: i=1 log(1 + e yia i x) + γ As a dataset, we used the Extended MNIST dataset... with n = 500k datapoints, transformed using a random features map (with dimension d = 1000). ... we used the regularization parameter γ = 10 8. ... For both SVRN-HA and SN-HA we use Hessian sample size k = 4d. Algorithm 1... with local iterations tmax = log2(n/d) and gradient batch size m = n/ log2(n/d)