Gradient Dynamics of Shallow Univariate ReLU Networks
Authors: Francis Williams, Matthew Trager, Daniele Panozzo, Claudio Silva, Denis Zorin, Joan Bruna
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a theoretical and empirical study of the gradient dynamics of overparameterized shallow Re LU networks with one-dimensional input, solving least-squares interpolation. For our numerical experiments, we use gradient descent with the parameterization (1) and α(m) = m, appropriately scaling the weights a, b, c to achieve different dynamical behaviors. We also refer to Section D in the Appendix for additional experiments. |
| Researcher Affiliation | Academia | Francis Williams Matthew Trager Claudio Silva Daniele Panozzo Denis Zorin Joan Bruna New York University |
| Pseudocode | No | The paper describes methods through mathematical formulations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | No | The paper mentions using "10 points sampled from a square wave" and fitting to a "sinusoid" for numerical experiments. These are either custom or synthetic datasets, and no concrete access information (link, DOI, formal citation) for a publicly available dataset is provided. |
| Dataset Splits | No | The paper mentions using "10 points sampled from a square wave" and fitting to a "sinusoid", but it does not specify any training, validation, or test dataset splits. The only numerical detail given for training is "10000 epochs" for some experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries or frameworks like PyTorch or TensorFlow with their versions). |
| Experiment Setup | Yes | For our numerical experiments, we use gradient descent with the parameterization (1) and α(m) = m, appropriately scaling the weights a, b, c to achieve different dynamical behaviors. We show in Figure 4 that as we vary δ, the network function goes from being smooth and non-adaptive in the kernel regime (δ = , i.e.training only the parameter c) to very adaptive (δ = , i.e.training only the parameters a, b). Note that as δ increases, clusters of knots emerge at the sample positions (collinear points in the uv diagrams). The caption of Figure 4 states "(10000 epochs)". |