Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches
Authors: Michal Derezinski
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We next demonstrate numerically that SVRN can be effectively used to accelerate stochastic Newton methods in practice. We also show how variance reduction can be incorporated into a globally convergent Subsampled Newton method in a way that is robust to hyperparameters and preserves its scalability thanks to large-batch operations.1 5.1 Logistic regression experiment In this section, we present numerical experiments for solving a regularized logistic loss minimization task. For an n d data matrix A with rows a i , an n-dimensional target vector y (with 1 entries yi) and a regularization parameter γ, our task is to minimize: i=1 log(1 + e yia i x) + γ As a dataset, we used the Extended MNIST dataset of handwritten digits (EMNIST, Cohen et al., 2017) with n = 500k datapoints, transformed using a random features map (with dimension d = 1000). Experimental details, as well as further results on the CIFAR-10 dataset and several synthetic data matrices, are presented in Appendix B. In Figure 2, we compared SVRN-HA to three baselines which are most directly comparable: (1) the classical Newton s method; (2) SVRG with the step size and number of inner iterations tuned for best wall-clock time; and (3) Subsampled Newton with Hessian Averaging (SN-HA), i.e., the method we use in the global phase of Algorithm 1 (without the SVRN phase). All of the convergence plots are averaged over 10 runs. |
| Researcher Affiliation | Academia | Michał Dereziński EMAIL Department of Electrical Engineering and Computer Science University of Michigan |
| Pseudocode | Yes | Algorithm 1: SVRN with Hessian Averaging (SVRN-HA) |
| Open Source Code | Yes | 1The code is available at https://github.com/svrnewton/svrn. |
| Open Datasets | Yes | As a dataset, we used the Extended MNIST dataset of handwritten digits (EMNIST, Cohen et al., 2017) with n = 500k datapoints... Experimental details, as well as further results on the CIFAR-10 dataset and several synthetic data matrices, are presented in Appendix B. |
| Dataset Splits | No | The paper mentions using EMNIST and CIFAR-10 datasets and transforming them, but does not explicitly provide the training/test/validation splits used for the experiments. It states "partitioned the classes into two labels 1 and -1" which is a data preprocessing step, not a dataset split. |
| Hardware Specification | No | The paper discusses parallel complexity and the benefits of large mini-batches for 'most computing architectures' and 'single- and multi-core architectures' but does not specify any particular hardware (e.g., GPU, CPU models, or cloud instances) used for running the experiments described in Section 5. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the implementation of their methods or experiments. |
| Experiment Setup | Yes | In this section, we present numerical experiments for solving a regularized logistic loss minimization task... regularization parameter γ, our task is to minimize: i=1 log(1 + e yia i x) + γ As a dataset, we used the Extended MNIST dataset... with n = 500k datapoints, transformed using a random features map (with dimension d = 1000). ... we used the regularization parameter γ = 10 8. ... For both SVRN-HA and SN-HA we use Hessian sample size k = 4d. Algorithm 1... with local iterations tmax = log2(n/d) and gradient batch size m = n/ log2(n/d) |