Multi-Layer Neural Networks as Trainable Ladders of Hilbert Spaces

Authors: Zhengdao Chen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we examine linear and shallow NNs from the new perspective and complement the theory with numerical results.
Researcher Affiliation Industry Google Research, Mountain View, CA, USA.
Pseudocode No The paper does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets No The paper uses synthetic data for its numerical experiments (e.g., 'We choose d = 10, n = 50 and ν = N(0, Id).'), but it does not mention or provide access information for any publicly available or open datasets.
Dataset Splits No The paper describes synthetic data generation and sample sizes but does not provide specific training/validation/test dataset splits needed for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not list specific software components with version numbers that would be necessary to reproduce the experiments.
Experiment Setup Yes We choose d = 10, n = 50 and ν = N(0, Id). In Figure 2, we plot the learning trajectories in the linear model space projected into the first two dimensions, i.e., vt,1 and vt,2. We see that the NHL dynamics solved by numerical integration closely predicts the actual GD dynamics when the width is large. Moreover, the NHL dynamics presents a nonlinear learning trajectory in the space of linear models, which is in contrast with, for example, the linear learning trajectory of performing linear regression under the population loss. We choose d = 1, n = 20, m = 512, the target function being f (x) = sin(2x), and ν being the uniform distribution on [0, 2π]. All parameters in the model, including untrained bias terms, are sampled i.i.d. from N(0, 1) at initialization.