A Self-Correcting Variable-Metric Algorithm for Stochastic Optimization

Authors: Frank Curtis

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments illustrate that the method and a limited memory variant of it are stable and outperform (mini-batch) stochastic gradient and other quasi-Newton methods when employed to solve a few machine learning problems.
Researcher Affiliation Academia Frank E. Curtis FRANK.E.CURTIS@GMAIL.COM Department of ISE, Lehigh University, 200 W. Packer Ave., Bethlehem, PA 18015 USA
Pseudocode Yes Algorithm SC-BFGS : Self-Correcting BFGS
Open Source Code Yes Code for running SC, SC-s, SC-L, and SC-L-s is publicly available.1 1http://coral.ise.lehigh.edu/frankecurtis/software/
Open Datasets Yes a1a from the LIBSVM website2 with (19) using a logistic loss function. ... rcv1(.binary) data from LIBSVM. ... 10-class mnist dataset. ... 2https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Dataset Splits No The paper mentions using a 'training set' and 'testing set' but does not specify explicit training/validation/test dataset splits, percentages, or absolute counts for reproducibility. It does not explicitly refer to a 'validation set' for hyperparameter tuning.
Hardware Specification Yes All experiments were run using Matlab R2014b on a Macbook Air with a 1.7 GHz Intel Core i7 processor and 8GB of RAM.
Software Dependencies Yes Algorithms SC-BFGS and SC-L-BFGS were implemented in Matlab... All experiments were run using Matlab R2014b
Experiment Setup Yes For all algorithms, diminishing stepsize sequences of the form αk = ω0/(ω1 + k) for all k N (20) were tested for all combinations of ω0 {20, 22, 24} and ω1 {20, 22, 24}, and sequences of fixed stepsizes, i.e., αk = ω2 for all k N, (21) were tested for ω2 {2 4, 2 2, 20, 22, 24}. For all SC* algorithms, all combinations of η {2 2, 2 4, 2 6} and θ {20, 22} were tested. For SC*-s, all combinations of ρ {2 2, 2 1} η and τ {21, 22} θ were tested, though the choices ˆkmax = 2 and σ = 0 were fixed. For o BFGS and o LBFGS, following (Schraudolph et al., 2007), the stochastic gradient displacement vectors were computed for the sample Sk in iteration k N as... The values ω3 {2 6, 2 4, 2 2, 20} were tested. For all limited memory methods, m = 5 was used. All stochastic gradient estimates were computed by randomly selecting 64 samples uniformly from the training set.