Nonconvex Variance Reduced Optimization with Arbitrary Sampling
Authors: Samuel Horváth, Peter Richtarik
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide the first importance sampling variants of variance-reduced algorithms for empirical risk minimization with non-convex loss functions. Our methods have the capacity to speed up the training process by an order of magnitude compared to the state of the art on real datasets. Moreover, we also improve upon current mini-batch analysis of these methods by proposing importance sampling for minibatches in this setting. Ours are the first optimal samplings for minibatches in the literature on stochastic optimization. Finally, we also perform a novel importance sampling analysis of SARAH in the convex setting. In this section, we perform experiments with regression for binary classification... We use four LIBSVM datasets: covtype, ijcnn1, splice, australian. |
| Researcher Affiliation | Academia | 1King Abdullah University of Science and Technology, Saudi Arabia 2Moscow Institute of Physics and Technology, Russia 3University of Edinburgh, United Kingdom. |
| Pseudocode | Yes | Algorithm 1 SVRG with arb. sampling; Algorithm 2 SAGA with arbitrary sampling; Algorithm 3 SARAH with arb. sampling; Algorithm 4 GD-Algorithm |
| Open Source Code | No | The paper does not provide explicit statements or links to open-source code for the methodology it describes. It only links to the LIBSVM dataset collection. |
| Open Datasets | Yes | We use four LIBSVM datasets: covtype, ijcnn1, splice, australian. The LIBSVM dataset collection is available at https: //www.csie.ntu.edu.tw/~cjlin/libsvmtools/ datasets/ |
| Dataset Splits | No | The paper mentions using LIBSVM datasets and discusses training, but it does not specify the train/validation/test splits used for these datasets, nor does it provide any information on how data was partitioned for training and evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch, or specific solver versions). |
| Experiment Setup | Yes | Parameters of each algorithm are chosen as suggested by the theorems in Section 3, and x0 = 0. For SARAH, we chose m = n/b . Theorem 3 (Complexity of SVRG with arbitrary sampling): ...step size η = µ2b/(α Ln2/3), and parameters β = L/n1/3, m = nα/(3bµ2)... Theorem 4 (Complexity of SAGA with arbitrary sampling): ...step size η = b/(µ3α L2n2/3), and parameter d = b/α... Theorem 5 (Complexity of SARAH with arbitrary sampling): ...step size η (m + 1)b 1. |