Step Size Matters in Deep Learning
Authors: Kamil Nar, Shankar Sastry
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate that small changes in the step size could lead to significantly different solutions, we generated a piecewise continuous function f : [0, 1] ! R and estimated it with a two-layer network by minimizing... with two different step sizes δ 2 {2 10 4, 3 10 4}, where W 2 R1 20, V 2 R20, b 2 R20, N = 1000 and xi = i/N for all i 2 [N]. The initial values of W, V and the constant vector b were all drawn from independent standard normal distributions; and the vector b was kept the same for both of the step sizes used. As shown in Figure 2, training with δ = 2 10 4 converged to a fixed solution, which provided an estimate ˆf close the original function f. |
| Researcher Affiliation | Academia | Kamil Nar S. Shankar Sastry Electrical Engineering and Computer Sciences University of California, Berkeley |
| Pseudocode | No | The paper contains mathematical equations and proofs but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for the experiment is available at https://github.com/nar-k/NeurIPS-2018. |
| Open Datasets | No | The paper generated a synthetic dataset for its experiment ("we generated a piecewise continuous function f : [0, 1] ! R and estimated it with a two-layer network by minimizing... N = 1000 and xi = i/N for all i 2 [N]"), but it does not provide access information for this specific generated data. |
| Dataset Splits | No | The paper describes using a dataset of N=1000 points for its experiment but does not specify any training, validation, or test splits. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper describes the methods and activations used (e.g., "Re LU activations", "gradient descent algorithm") but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x). |
| Experiment Setup | Yes | To demonstrate that small changes in the step size could lead to significantly different solutions, we generated a piecewise continuous function f : [0, 1] ! R and estimated it with a two-layer network by minimizing... with two different step sizes δ 2 {2 10 4, 3 10 4}, where W 2 R1 20, V 2 R20, b 2 R20, N = 1000 and xi = i/N for all i 2 [N]. The initial values of W, V and the constant vector b were all drawn from independent standard normal distributions; and the vector b was kept the same for both of the step sizes used. |