A Self-Correcting Variable-Metric Algorithm for Stochastic Optimization
Authors: Frank Curtis
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments illustrate that the method and a limited memory variant of it are stable and outperform (mini-batch) stochastic gradient and other quasi-Newton methods when employed to solve a few machine learning problems. |
| Researcher Affiliation | Academia | Frank E. Curtis FRANK.E.CURTIS@GMAIL.COM Department of ISE, Lehigh University, 200 W. Packer Ave., Bethlehem, PA 18015 USA |
| Pseudocode | Yes | Algorithm SC-BFGS : Self-Correcting BFGS |
| Open Source Code | Yes | Code for running SC, SC-s, SC-L, and SC-L-s is publicly available.1 1http://coral.ise.lehigh.edu/frankecurtis/software/ |
| Open Datasets | Yes | a1a from the LIBSVM website2 with (19) using a logistic loss function. ... rcv1(.binary) data from LIBSVM. ... 10-class mnist dataset. ... 2https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper mentions using a 'training set' and 'testing set' but does not specify explicit training/validation/test dataset splits, percentages, or absolute counts for reproducibility. It does not explicitly refer to a 'validation set' for hyperparameter tuning. |
| Hardware Specification | Yes | All experiments were run using Matlab R2014b on a Macbook Air with a 1.7 GHz Intel Core i7 processor and 8GB of RAM. |
| Software Dependencies | Yes | Algorithms SC-BFGS and SC-L-BFGS were implemented in Matlab... All experiments were run using Matlab R2014b |
| Experiment Setup | Yes | For all algorithms, diminishing stepsize sequences of the form αk = ω0/(ω1 + k) for all k N (20) were tested for all combinations of ω0 {20, 22, 24} and ω1 {20, 22, 24}, and sequences of fixed stepsizes, i.e., αk = ω2 for all k N, (21) were tested for ω2 {2 4, 2 2, 20, 22, 24}. For all SC* algorithms, all combinations of η {2 2, 2 4, 2 6} and θ {20, 22} were tested. For SC*-s, all combinations of ρ {2 2, 2 1} η and τ {21, 22} θ were tested, though the choices ˆkmax = 2 and σ = 0 were fixed. For o BFGS and o LBFGS, following (Schraudolph et al., 2007), the stochastic gradient displacement vectors were computed for the sample Sk in iteration k N as... The values ω3 {2 6, 2 4, 2 2, 20} were tested. For all limited memory methods, m = 5 was used. All stochastic gradient estimates were computed by randomly selecting 64 samples uniformly from the training set. |