Stochastic Reweighted Gradient Descent
Authors: Ayoub El Hanchi, David Stephens, Chris Maddison
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate improved convergence in practice on regularized logistic regression problems. In this section, we empirically verify our two main claims: (i) SRG performs variance reduction which can improve the asymptotic error of SGD. (ii) SRG+ performs both variance reduction and preconditioning, and can both reduce the asymptotic error of SGD and allow the use of larger step sizes. We start with controlled synthetic experiments that provide direct support for our claims. We then compare SRG to other baseline optimizers on ℓ2-regularized logistic regression problems. |
| Researcher Affiliation | Academia | 1University of Toronto and Vector Institute 2Mc Gill University. Correspondence to: Ayoub El Hanchi <aelhan@cs.toronto.edu>. |
| Pseudocode | Yes | Algorithm 1 SRG Parameters: step sizes (αk) k=0 > 0, mixture coefficients (θk) k=0 (0, 1] Initialization: x0 Rd, ( gi 0 2)n i=1 Rn for k = 0, 1, 2, . . . do pk = (1 θk)qk + θk/n {qk is defined in (6)} bk Bernoulli(θk) if bk = 1 then ik 1/n else ik qk xk+1 = xk αk 1 np ik k fik(xk) ( fi(xk) 2 if bk = 1 and ik = i gi k 2 otherwise end for |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it include a specific repository link or explicit code release statement. |
| Open Datasets | Yes | We experiment with SRG on four datasets from LIBSVM (Chang & Lin, 2011): ijcnn1, w8a, mushrooms and phishing. |
| Dataset Splits | No | The paper mentions selecting a subset of data of size n=1000 but does not provide specific details on how this data is split into training, validation, and test sets (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It describes the datasets and experimental setup but omits hardware specifications. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | For SRG, we used the mixture coefficient θ = 1/2. We used the mixture coefficient θ = 1/2 for both SRG+ and SGD++. We used the mixture coefficient θ = 1/2 for SRG, and used the same step size α = θ/2L for all algorithms. For each dataset, and to be able to efficiently run our experiments, we randomly select a subset of the data of size n = 1000. |