Optimal Mini-Batch and Step Sizes for SAGA

Authors: Nidham Gazagnadou, Robert Gower, Joseph Salmon

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We fill in this gap, and provide simple closed form expressions for the expected smoothness constant and careful numerical experiments verifying these bounds. Using these bounds, and since the SAGA algorithm is part of this Jac Sketch family, we suggest a new standard practice for setting the step and minibatch sizes for SAGA that are competitive with a numerical grid search. Furthermore, we can now show that the total complexity of the SAGA algorithm decreases linearly in the mini-batch size up to a pre-defined value: the optimal mini-batch size. This is a rare result in the stochastic variance reduced literature, only previously shown for the Katyusha algorithm. Finally we conjecture that this is the case for many other stochastic variance reduced methods and that our bounds and analysis of the expected smoothness constant is key to extending these results.
Researcher Affiliation Academia 1LTCI, T el ecom Paris Tech, Universit e Paris-Saclay, Paris, France 2IMAG, Univ Montpellier, CNRS, Montpellier, France.
Pseudocode Yes Algorithm 1 JACSKETCH VERSION OF b-NICE SAGA Input :mini-batch size b, step size γ > 0 Initialize :w0 Rd, J0 Rd n for k = 0, 1, 2, . . . do Sample a fresh batch B [n] s.t. |B| = b gk = 1 i B( fi(wk) Jk :i) // update the gradient estimate ( Jk :i, if i / B fi(wk), if i B. // update the Jacobian estimate wk+1 = wk γgk // take a step return wk
Open Source Code Yes All the experiments were run in Julia and the code is freely available on https://github.com/gowerrobert/StochOpt.jl.
Open Datasets Yes publicly available datasets from LIBSVM and the UCI repository
Dataset Splits No The paper does not specify explicit training/validation/test splits by percentages, sample counts, or refer to pre-defined splits. It mentions using datasets and achieving a 'relative error of 10^-4' as a stopping criterion, but no detailed split information is provided.
Hardware Specification No The paper states that experiments were run in Julia but does not provide any specific details about the hardware used, such as GPU or CPU models, memory, or other specifications.
Software Dependencies No The paper mentions that experiments were 'run in Julia' but does not specify version numbers for Julia itself or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes For instance for the simple bound, given ϵ > 0 and plugging in (18) into (16) gives Ktotal(b) max {gsimple(b), h(b)} log 1 where gsimple(b) := 4(n L Lmax+(n 1)λ) µ(n 1) b + 4n(Lmax L) and h(b) := 4(Lmax+λ) µ(n 1) b + n 1 + 1 n 1 4(Lmax+λ). We compare the performance of SAGA when using the mini-batch size and step size b = 1, γDefazio := 1/3(nµ + Lmax) given in (Defazio et al., 2014), b = 20 and γHofmann = 20/nµ given in Hofmann et al. (2015), to our new practical mini-batch size bpractical = 1 + 1 4Lµ(n 1) and step size γpractical. Our goal is to verify how much our parameter setting can improve practical performance. We also compare with a step size γgridsearch obtained by grid search over odd powers of 2. These methods are run until they reach a relative error of 10 4. Our grid is {2i, i = 0, . . . , 14}, with 216, 218 and n being added when needed.