Barzilai-Borwein Step Size for Stochastic Gradient Descent
Authors: Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants. |
| Researcher Affiliation | Academia | Conghui Tan The Chinese University of Hong Kong chtan@se.cuhk.edu.hk Shiqian Ma The Chinese University of Hong Kong sqma@se.cuhk.edu.hk Yu-Hong Dai Chinese Academy of Sciences, Beijing, China dyh@lsec.cc.ac.cn Yuqiu Qian The University of Hong Kong qyq79@connect.hku.hk |
| Pseudocode | Yes | Algorithm 1 SVRG with BB step size (SVRG-BB) Parameters: update frequency m, initial point x0, initial step size η0 (only used in the first epoch) for k = 0, 1, do... Algorithm 2 SGD with BB step size (SGD-BB) Parameters: update frequency m, initial step sizes η0 and η1 (only used in the first two epochs), weighting parameter β (0, 1), initial point x0 for k = 0, 1, do... |
| Open Source Code | No | The paper does not include an explicit statement or link to the source code for the described methodology. |
| Open Datasets | Yes | We tested SVRG-BB and SGD-BB on three standard real data sets, which were downloaded from the LIBSVM website1. Detailed information of the data sets are given in Table 1. 1www.csie.ntu.edu.tw/~cjlin/libsvmtools/. |
| Dataset Splits | No | The paper mentions using "standard real data sets" but does not specify the exact percentages or counts for training, validation, or test splits. It does not provide sufficient details for data partitioning reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers required to reproduce the experiments. |
| Experiment Setup | Yes | For both SVRG-BB and SVRG, we set m = 2n as suggested in [10]. We set m = n, β = 10/m and η1 = η0 in our experiments. We used φ(k) = k + 1 when applying the smoothing technique. |