SEGA: Variance Reduction via Gradient Sketching

Authors: Filip Hanzely, Konstantin Mishchenko, Peter Richtarik

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we perform numerical experiments to illustrate the potential of SEGA. Firstly, in Section 5.1, we compare it to projected gradient descent (PGD) algorithm. Then in Section 5.2, we study the performance of zeroth-order SEGA (when sketched gradients are being estimated through function value evaluations) and compare it to the analogous zeroth-order method. Lastly, in Section 5.3 we verify the claim from Remark 3.6 that in some applications, particular sketches and metric might lead to a significantly faster convergence. In the experiments where theory-supported stepsizes were used, we obtained them by precomputing strong convexity and smoothness measures.
Researcher Affiliation Academia Filip Hanzely1 Konstantin Mishchenko1 Peter Richt arik1,2,3 1 King Abdullah University of Science and Technology, 2University of Edinburgh, 3Moscow Institute of Physics and Technology
Pseudocode Yes Algorithm 1: SEGA: Sk Etched Gr Adient Method; Algorithm 2: ASEGA: Accelerated SEGA
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available.
Open Datasets No For f, we choose 4 different quadratics, see Table 2 (appendix). We stress that these are synthetic problems generated for the purpose of illustrating the potential of our method against a natural baseline. For illustration, we choose f to be a quadratic problem based on Table 2 and compare both Gaussian and coordinate sketches. The paper uses synthetic problems or refers to a table for problem parameters, but does not provide access information or citations for any publicly available or open datasets.
Dataset Splits No The paper describes synthetic problems and experimental setups but does not provide specific dataset split information (percentages, sample counts, or methodology) needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes Stepsizes 1/λmax(M) and 1/(nλmax(M)) were used for PGD and SEGA, respectively. Theory supported stepsizes were chosen for both methods. Stepsize chosen according to theory.