SGD: General Analysis and Improved Rates

Authors: Robert Mansel Gower, Nicolas Loizou, Xun Qian, Alibek Sailanbayev, Egor Shulgin, Peter Richtárik

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically validate our theoretical results. We perform three experiments in each of which we highlight a different aspect of our contributions.
Researcher Affiliation Academia 1Telecom ParisTech, LTCI, Universit´e Paris-Saclay, France 2University of Edinburgh, United Kingdom 3King Abdullah University of Science and Technology, Kingdom of Saudi Arabia 4Moscow Institute of Physics and Technology, Russian Federation.
Pseudocode No The paper presents mathematical update rules for SGD (e.g., equation 6) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link regarding the public availability of source code for the methodology described.
Open Datasets Yes For our experiments on real data we choose several LIBSVM (Chang & Lin, 2011) datasets. ... Ridge regression problem (first row): on left synthetic data, on right real dataset: abalone from LIBSVM. Logistic regression problem(second row): on left synthetic data, on right real data-set: a1a from LIBSVM. ... Above: the w3a data-set from LIBSVM.
Dataset Splits No The paper mentions using synthetic and real datasets but does not explicitly provide details about training/validation/test splits, such as percentages or sample counts.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using LIBSVM but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In all experiments, to evaluate SGD we use the relative error measure xk x 2 / x0 x 2 . For all implementations, the starting point x0 is sampled from the standard Gaussian. We run each method until xk x 2 < 10 3 or until a prespecified maximum number of epochs is achieved. ... In all experiments λ = 1/n. ... n = 4912, d = 300, λ = 100/n, ϵ = 10 3, τ = n/100