Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Authors: Michal Derezinski

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We next demonstrate numerically that SVRN can be effectively used to accelerate stochastic Newton methods in practice. We also show how variance reduction can be incorporated into a globally convergent Subsampled Newton method in a way that is robust to hyperparameters and preserves its scalability thanks to large-batch operations.1 5.1 Logistic regression experiment In this section, we present numerical experiments for solving a regularized logistic loss minimization task. For an n d data matrix A with rows a i , an n-dimensional target vector y (with 1 entries yi) and a regularization parameter γ, our task is to minimize: i=1 log(1 + e yia i x) + γ As a dataset, we used the Extended MNIST dataset of handwritten digits (EMNIST, Cohen et al., 2017) with n = 500k datapoints, transformed using a random features map (with dimension d = 1000). Experimental details, as well as further results on the CIFAR-10 dataset and several synthetic data matrices, are presented in Appendix B. In Figure 2, we compared SVRN-HA to three baselines which are most directly comparable: (1) the classical Newton s method; (2) SVRG with the step size and number of inner iterations tuned for best wall-clock time; and (3) Subsampled Newton with Hessian Averaging (SN-HA), i.e., the method we use in the global phase of Algorithm 1 (without the SVRN phase). All of the convergence plots are averaged over 10 runs.
Researcher Affiliation Academia Michał Dereziński EMAIL Department of Electrical Engineering and Computer Science University of Michigan
Pseudocode Yes Algorithm 1: SVRN with Hessian Averaging (SVRN-HA)
Open Source Code Yes 1The code is available at https://github.com/svrnewton/svrn.
Open Datasets Yes As a dataset, we used the Extended MNIST dataset of handwritten digits (EMNIST, Cohen et al., 2017) with n = 500k datapoints... Experimental details, as well as further results on the CIFAR-10 dataset and several synthetic data matrices, are presented in Appendix B.
Dataset Splits No The paper mentions using EMNIST and CIFAR-10 datasets and transforming them, but does not explicitly provide the training/test/validation splits used for the experiments. It states "partitioned the classes into two labels 1 and -1" which is a data preprocessing step, not a dataset split.
Hardware Specification No The paper discusses parallel complexity and the benefits of large mini-batches for 'most computing architectures' and 'single- and multi-core architectures' but does not specify any particular hardware (e.g., GPU, CPU models, or cloud instances) used for running the experiments described in Section 5.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the implementation of their methods or experiments.
Experiment Setup Yes In this section, we present numerical experiments for solving a regularized logistic loss minimization task... regularization parameter γ, our task is to minimize: i=1 log(1 + e yia i x) + γ As a dataset, we used the Extended MNIST dataset... with n = 500k datapoints, transformed using a random features map (with dimension d = 1000). ... we used the regularization parameter γ = 10 8. ... For both SVRN-HA and SN-HA we use Hessian sample size k = 4d. Algorithm 1... with local iterations tmax = log2(n/d) and gradient batch size m = n/ log2(n/d)