Stochastic Reweighted Gradient Descent

Authors: Ayoub El Hanchi, David Stephens, Chris Maddison

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate improved convergence in practice on regularized logistic regression problems. In this section, we empirically verify our two main claims: (i) SRG performs variance reduction which can improve the asymptotic error of SGD. (ii) SRG+ performs both variance reduction and preconditioning, and can both reduce the asymptotic error of SGD and allow the use of larger step sizes. We start with controlled synthetic experiments that provide direct support for our claims. We then compare SRG to other baseline optimizers on ℓ2-regularized logistic regression problems.
Researcher Affiliation Academia 1University of Toronto and Vector Institute 2Mc Gill University. Correspondence to: Ayoub El Hanchi <aelhan@cs.toronto.edu>.
Pseudocode Yes Algorithm 1 SRG Parameters: step sizes (αk) k=0 > 0, mixture coefficients (θk) k=0 (0, 1] Initialization: x0 Rd, ( gi 0 2)n i=1 Rn for k = 0, 1, 2, . . . do pk = (1 θk)qk + θk/n {qk is defined in (6)} bk Bernoulli(θk) if bk = 1 then ik 1/n else ik qk xk+1 = xk αk 1 np ik k fik(xk) ( fi(xk) 2 if bk = 1 and ik = i gi k 2 otherwise end for
Open Source Code No The paper does not provide concrete access to source code for the methodology described, nor does it include a specific repository link or explicit code release statement.
Open Datasets Yes We experiment with SRG on four datasets from LIBSVM (Chang & Lin, 2011): ijcnn1, w8a, mushrooms and phishing.
Dataset Splits No The paper mentions selecting a subset of data of size n=1000 but does not provide specific details on how this data is split into training, validation, and test sets (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It describes the datasets and experimental setup but omits hardware specifications.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes For SRG, we used the mixture coefficient θ = 1/2. We used the mixture coefficient θ = 1/2 for both SRG+ and SGD++. We used the mixture coefficient θ = 1/2 for SRG, and used the same step size α = θ/2L for all algorithms. For each dataset, and to be able to efficiently run our experiments, we randomly select a subset of the data of size n = 1000.