Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations
Authors: Kevin Scaman, Cedric Malherbe
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we illustrate the practical implications of the results obtained in the paper. [...] The set of experiments consists in finding the parameters x Rd of a ridge regression that minimize the empirical penalized loss f(x) = Y ξx 2 + λ x 2 over the Airfoil Self-Noise Data Set taken from the UCI machine learning repository [34] denoted here by (Y, ξ) Rn Rd n where n = 1503 and d = 5 and with a regularization parameter set to λ = 10. |
| Researcher Affiliation | Industry | Kevin Scaman Cédric Malherbe Huawei Noah s Ark Lab |
| Pseudocode | Yes | Algorithm 1 Stochastic gradient descent (SGD) Input: iterations T, gradient step η, initial state x0 Output: optimizer x T 1: for t = 0 to T 1 do 2: Compute Gt, the noisy approximation of f(xt) 3: xt+1 = xt ηGt 4: end for 5: return x T |
| Open Source Code | No | The paper does not provide any specific repository link or explicit statement about the release of source code for the methodology. |
| Open Datasets | Yes | The set of experiments consists in finding the parameters x Rd of a ridge regression that minimize the empirical penalized loss f(x) = Y ξx 2 + λ x 2 over the Airfoil Self-Noise Data Set taken from the UCI machine learning repository [34] denoted here by (Y, ξ) Rn Rd n where n = 1503 and d = 5 and with a regularization parameter set to λ = 10. |
| Dataset Splits | No | The paper mentions running experiments with a budget of T = 10^5 iterations, but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We considered three different step-size scenarii: (1) constant step-size ηt = 10 4, (2) ηt = 10 4 t 1/b provided by Theorem 17 and (3) the standard ηt = 10 4 t 1/2 traditionally used in SGD. For each step-size, we ran 1000 times Alg. 1 with a budget of T = 105 iterations starting from the solution of the nonpenalized problem x0 = (ξT ξ) 1ξT Y . |