Barzilai-Borwein Step Size for Stochastic Gradient Descent

Authors: Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.
Researcher Affiliation Academia Conghui Tan The Chinese University of Hong Kong chtan@se.cuhk.edu.hk Shiqian Ma The Chinese University of Hong Kong sqma@se.cuhk.edu.hk Yu-Hong Dai Chinese Academy of Sciences, Beijing, China dyh@lsec.cc.ac.cn Yuqiu Qian The University of Hong Kong qyq79@connect.hku.hk
Pseudocode Yes Algorithm 1 SVRG with BB step size (SVRG-BB) Parameters: update frequency m, initial point x0, initial step size η0 (only used in the first epoch) for k = 0, 1, do... Algorithm 2 SGD with BB step size (SGD-BB) Parameters: update frequency m, initial step sizes η0 and η1 (only used in the first two epochs), weighting parameter β (0, 1), initial point x0 for k = 0, 1, do...
Open Source Code No The paper does not include an explicit statement or link to the source code for the described methodology.
Open Datasets Yes We tested SVRG-BB and SGD-BB on three standard real data sets, which were downloaded from the LIBSVM website1. Detailed information of the data sets are given in Table 1. 1www.csie.ntu.edu.tw/~cjlin/libsvmtools/.
Dataset Splits No The paper mentions using "standard real data sets" but does not specify the exact percentages or counts for training, validation, or test splits. It does not provide sufficient details for data partitioning reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers required to reproduce the experiments.
Experiment Setup Yes For both SVRG-BB and SVRG, we set m = 2n as suggested in [10]. We set m = n, β = 10/m and η1 = η0 in our experiments. We used φ(k) = k + 1 when applying the smoothing technique.