Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization
Authors: Pan Zhou, Xiao-Tong Yuan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we carry out experiments to compare the numerical performance of HSDMPG with several representative stochastic gradient optimization algorithms, including SGD (Robbins & Monro, 1951), SVRG (Johnson & Zhang, 2013), APCG (Lin et al., 2014), Katyusha (Allen Zhu, 2017) and SCSG (Lei & Jordan, 2017). We evaluate all the considered algorithms on two sets of strongly-convex learning tasks. The first set is for ridge regression with least squared loss... In the second setting we consider two classification models: logistic regression... and multi-class softmax regression... We run simulations on ten datasets whose details are described in Appendix D.4. ... Figure 1: Single-epoch processing: stochastic gradient algorithms process data a single pass on quadratic problems. ... Figure 2: Multi-epoch processing: stochastic gradient algorithms process data multiple pass on quadratic problems. ... Figure 3: Multi-epoch processing (about 8 epochs): stochastic gradient algorithms process data multiple pass on logistic regression problems (ijcnn and w08) and softmax regression problems (protein and letter). |
| Researcher Affiliation | Collaboration | 1Salesforce Research 2 B-DAT Lab and CICAEET, Nanjing University of Information Science & Technology, Nanjing, 210044, China. Correspondence to: Xiao-Tong Yuan <xtyuan@nuist.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Hybrid Stochastic-Deterministic Minibatch Proximal Gradient (HSDMPG) for quadratic loss. ... Algorithm 2 Hybrid Stochastic-Deterministic Minibatch Proximal Gradient (HSDMPG) on the generic loss. |
| Open Source Code | No | The paper does not provide a direct statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | We run simulations on ten datasets whose details are described in Appendix D.4. ... Appendix D.4 Datasets Details: We conduct experiments on ten datasets from LIBSVM (Chang & Lin, 2011) and UCI (Dua & Graff, 2017) repository, which cover a wide range of applications and data properties. |
| Dataset Splits | No | The paper mentions training and test sets (e.g., for 'ijcnn1': "The training set consists of 49,040 samples with 22 features, and the test set consists of 9,869 samples"), but does not provide specific details for a validation set or explicit split percentages for training, validation, and test. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms (e.g., SVRG) and datasets from LIBSVM and UCI, but it does not specify any programming languages, libraries, or frameworks with their version numbers that were used for implementation. |
| Experiment Setup | Yes | For HSDMPG, we set the size s of S around n0.75. For the minibatch for inner problems, we set initial minibatch size |S1| = 50 and then follow our theory to exponentially expand size of St with proper exponential rate. The regularization constant in the subproblem (3) is set to be γ = p log(d)/s as suggested by our theory. The optimization error εt in (3) is controlled by respectively allowing SVRG to run 3 epochs and 10 epochs on the two sets of tasks. Similarly, we control the optimization error ε t in (5) by running SVRG with 3 epochs. ... we set the regularization parameter µ = 0.01 to make the quadratic problems well-conditioned. ... Here we reset the regularization strength parameter in quadratic problems as µ = 10 4 for generating more challenging optimization tasks. ... their regularization modulus parameters are set as µ = 0.01. |