SGD: General Analysis and Improved Rates
Authors: Robert Mansel Gower, Nicolas Loizou, Xun Qian, Alibek Sailanbayev, Egor Shulgin, Peter Richtárik
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically validate our theoretical results. We perform three experiments in each of which we highlight a different aspect of our contributions. |
| Researcher Affiliation | Academia | 1Telecom ParisTech, LTCI, Universit´e Paris-Saclay, France 2University of Edinburgh, United Kingdom 3King Abdullah University of Science and Technology, Kingdom of Saudi Arabia 4Moscow Institute of Physics and Technology, Russian Federation. |
| Pseudocode | No | The paper presents mathematical update rules for SGD (e.g., equation 6) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the public availability of source code for the methodology described. |
| Open Datasets | Yes | For our experiments on real data we choose several LIBSVM (Chang & Lin, 2011) datasets. ... Ridge regression problem (first row): on left synthetic data, on right real dataset: abalone from LIBSVM. Logistic regression problem(second row): on left synthetic data, on right real data-set: a1a from LIBSVM. ... Above: the w3a data-set from LIBSVM. |
| Dataset Splits | No | The paper mentions using synthetic and real datasets but does not explicitly provide details about training/validation/test splits, such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using LIBSVM but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In all experiments, to evaluate SGD we use the relative error measure xk x 2 / x0 x 2 . For all implementations, the starting point x0 is sampled from the standard Gaussian. We run each method until xk x 2 < 10 3 or until a prespecified maximum number of epochs is achieved. ... In all experiments λ = 1/n. ... n = 4912, d = 300, λ = 100/n, ϵ = 10 3, τ = n/100 |