Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Authors: Kevin Scaman, Cedric Malherbe

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we illustrate the practical implications of the results obtained in the paper. [...] The set of experiments consists in finding the parameters x Rd of a ridge regression that minimize the empirical penalized loss f(x) = Y ξx 2 + λ x 2 over the Airfoil Self-Noise Data Set taken from the UCI machine learning repository [34] denoted here by (Y, ξ) Rn Rd n where n = 1503 and d = 5 and with a regularization parameter set to λ = 10.
Researcher Affiliation Industry Kevin Scaman Cédric Malherbe Huawei Noah s Ark Lab
Pseudocode Yes Algorithm 1 Stochastic gradient descent (SGD) Input: iterations T, gradient step η, initial state x0 Output: optimizer x T 1: for t = 0 to T 1 do 2: Compute Gt, the noisy approximation of f(xt) 3: xt+1 = xt ηGt 4: end for 5: return x T
Open Source Code No The paper does not provide any specific repository link or explicit statement about the release of source code for the methodology.
Open Datasets Yes The set of experiments consists in finding the parameters x Rd of a ridge regression that minimize the empirical penalized loss f(x) = Y ξx 2 + λ x 2 over the Airfoil Self-Noise Data Set taken from the UCI machine learning repository [34] denoted here by (Y, ξ) Rn Rd n where n = 1503 and d = 5 and with a regularization parameter set to λ = 10.
Dataset Splits No The paper mentions running experiments with a budget of T = 10^5 iterations, but does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We considered three different step-size scenarii: (1) constant step-size ηt = 10 4, (2) ηt = 10 4 t 1/b provided by Theorem 17 and (3) the standard ηt = 10 4 t 1/2 traditionally used in SGD. For each step-size, we ran 1000 times Alg. 1 with a budget of T = 105 iterations starting from the solution of the nonpenalized problem x0 = (ξT ξ) 1ξT Y .