Step Size Matters in Deep Learning

Authors: Kamil Nar, Shankar Sastry

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate that small changes in the step size could lead to significantly different solutions, we generated a piecewise continuous function f : [0, 1] ! R and estimated it with a two-layer network by minimizing... with two different step sizes δ 2 {2 10 4, 3 10 4}, where W 2 R1 20, V 2 R20, b 2 R20, N = 1000 and xi = i/N for all i 2 [N]. The initial values of W, V and the constant vector b were all drawn from independent standard normal distributions; and the vector b was kept the same for both of the step sizes used. As shown in Figure 2, training with δ = 2 10 4 converged to a fixed solution, which provided an estimate ˆf close the original function f.
Researcher Affiliation Academia Kamil Nar S. Shankar Sastry Electrical Engineering and Computer Sciences University of California, Berkeley
Pseudocode No The paper contains mathematical equations and proofs but no structured pseudocode or algorithm blocks.
Open Source Code Yes The code for the experiment is available at https://github.com/nar-k/NeurIPS-2018.
Open Datasets No The paper generated a synthetic dataset for its experiment ("we generated a piecewise continuous function f : [0, 1] ! R and estimated it with a two-layer network by minimizing... N = 1000 and xi = i/N for all i 2 [N]"), but it does not provide access information for this specific generated data.
Dataset Splits No The paper describes using a dataset of N=1000 points for its experiment but does not specify any training, validation, or test splits.
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper describes the methods and activations used (e.g., "Re LU activations", "gradient descent algorithm") but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x).
Experiment Setup Yes To demonstrate that small changes in the step size could lead to significantly different solutions, we generated a piecewise continuous function f : [0, 1] ! R and estimated it with a two-layer network by minimizing... with two different step sizes δ 2 {2 10 4, 3 10 4}, where W 2 R1 20, V 2 R20, b 2 R20, N = 1000 and xi = i/N for all i 2 [N]. The initial values of W, V and the constant vector b were all drawn from independent standard normal distributions; and the vector b was kept the same for both of the step sizes used.