Nonconvex Variance Reduced Optimization with Arbitrary Sampling

Authors: Samuel Horváth, Peter Richtarik

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide the first importance sampling variants of variance-reduced algorithms for empirical risk minimization with non-convex loss functions. Our methods have the capacity to speed up the training process by an order of magnitude compared to the state of the art on real datasets. Moreover, we also improve upon current mini-batch analysis of these methods by proposing importance sampling for minibatches in this setting. Ours are the first optimal samplings for minibatches in the literature on stochastic optimization. Finally, we also perform a novel importance sampling analysis of SARAH in the convex setting. In this section, we perform experiments with regression for binary classification... We use four LIBSVM datasets: covtype, ijcnn1, splice, australian.
Researcher Affiliation Academia 1King Abdullah University of Science and Technology, Saudi Arabia 2Moscow Institute of Physics and Technology, Russia 3University of Edinburgh, United Kingdom.
Pseudocode Yes Algorithm 1 SVRG with arb. sampling; Algorithm 2 SAGA with arbitrary sampling; Algorithm 3 SARAH with arb. sampling; Algorithm 4 GD-Algorithm
Open Source Code No The paper does not provide explicit statements or links to open-source code for the methodology it describes. It only links to the LIBSVM dataset collection.
Open Datasets Yes We use four LIBSVM datasets: covtype, ijcnn1, splice, australian. The LIBSVM dataset collection is available at https: //www.csie.ntu.edu.tw/~cjlin/libsvmtools/ datasets/
Dataset Splits No The paper mentions using LIBSVM datasets and discusses training, but it does not specify the train/validation/test splits used for these datasets, nor does it provide any information on how data was partitioned for training and evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch, or specific solver versions).
Experiment Setup Yes Parameters of each algorithm are chosen as suggested by the theorems in Section 3, and x0 = 0. For SARAH, we chose m = n/b . Theorem 3 (Complexity of SVRG with arbitrary sampling): ...step size η = µ2b/(α Ln2/3), and parameters β = L/n1/3, m = nα/(3bµ2)... Theorem 4 (Complexity of SAGA with arbitrary sampling): ...step size η = b/(µ3α L2n2/3), and parameter d = b/α... Theorem 5 (Complexity of SARAH with arbitrary sampling): ...step size η (m + 1)b 1.