Stochastic Variance-Reduced Cubic Regularized Newton Methods

Authors: Dongruo Zhou, Pan Xu, Quanquan Gu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Thorough experiments on various non-convex optimization problems support our theory. [...] In this section, we present numerical experiments on different non-convex Empirical Risk Minimization (ERM) problems and on different datasets to validate the advantage of our SVRC algorithm in finding approximate local minima. [...] From Figures 1, 2 and 3, we can see that our algorithm SVRC outperforms all the other baseline algorithms on all the datasets.
Researcher Affiliation Academia Dongruo Zhou 1 Pan Xu 1 Quanquan Gu 1 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>.
Pseudocode Yes Algorithm 1 Stochastic Variance Reduction Cubic Regularization (SVRC) 1: Input: batch size bg, bh, cubic penalty parameter {Ms,t}, epoch number S, epoch length T and starting point x0. 2: Initialization bx1 = x0 3: for s = 1, . . . , S do...
Open Source Code No The paper does not provide any specific links to open-source code or explicitly state that the code for their method is publicly available.
Open Datasets Yes The datasets we use are a9a, covtype, ijcnn1, which are common datasets used in ERM problems. The detailed information about these datasets are in Table 2. (Table 2 lists 'a9a', 'covtype', 'ijcnn1' with sample size and dimension).
Dataset Splits No The paper does not explicitly provide details about training, validation, or test splits for the datasets used in the experiments.
Hardware Specification No The paper does not specify any hardware used for the experiments, such as CPU or GPU models, or other specific machine configurations.
Software Dependencies No The paper mentions using a 'Lanczos-type method' for the subproblem solver but does not provide specific software names with version numbers for any dependencies.
Experiment Setup Yes Parameters and subproblem solver: For each algorithm and each dataset, we choose different bg, bh, T for the best performance. Meanwhile, we choose Ms,t = /(1 + β)(s+t/T ), , β > 0 for each iteration. [...] We set = 0.05, β = 0 for a9a and ijcnn1 datasets and = 5e3, β = 0.15 for covtype.