Multi-Layer Neural Networks as Trainable Ladders of Hilbert Spaces
Authors: Zhengdao Chen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we examine linear and shallow NNs from the new perspective and complement the theory with numerical results. |
| Researcher Affiliation | Industry | Google Research, Mountain View, CA, USA. |
| Pseudocode | No | The paper does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | No | The paper uses synthetic data for its numerical experiments (e.g., 'We choose d = 10, n = 50 and ν = N(0, Id).'), but it does not mention or provide access information for any publicly available or open datasets. |
| Dataset Splits | No | The paper describes synthetic data generation and sample sizes but does not provide specific training/validation/test dataset splits needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not list specific software components with version numbers that would be necessary to reproduce the experiments. |
| Experiment Setup | Yes | We choose d = 10, n = 50 and ν = N(0, Id). In Figure 2, we plot the learning trajectories in the linear model space projected into the first two dimensions, i.e., vt,1 and vt,2. We see that the NHL dynamics solved by numerical integration closely predicts the actual GD dynamics when the width is large. Moreover, the NHL dynamics presents a nonlinear learning trajectory in the space of linear models, which is in contrast with, for example, the linear learning trajectory of performing linear regression under the population loss. We choose d = 1, n = 20, m = 512, the target function being f (x) = sin(2x), and ν being the uniform distribution on [0, 2π]. All parameters in the model, including untrained bias terms, are sampled i.i.d. from N(0, 1) at initialization. |