On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

Authors: Lili Su, Pengkun Yang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A numerical illustration of the decay of λmin(H) in n is presented in Fig. 1a. A numerical illustration of the spectrum concentration of K is given in Fig. 1b; Training with f being randomly generated linear or quadratic functions with n = 1000, m = 2000.
Researcher Affiliation Academia Lili Su CSAIL, MIT lilisu@mit.edu Pengkun Yang Department of Electrical Engineering Princeton University pengkuny@princeton.edu
Pseudocode No The paper describes the gradient descent update rules and initialization steps in paragraph text and mathematical equations (e.g., (3), (5)) but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide an explicit statement or link for open-source code related to the described methodology.
Open Datasets No The paper describes data generation from a distribution (e.g., 'uniform distribution on the spheres') and mentions 'training with f being randomly generated linear or quadratic functions', but it does not specify or provide access information (link, citation to a public dataset) for any publicly available dataset used for its numerical illustrations.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It mentions 'training dataset' but no specific percentages or counts for different splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for its numerical illustrations or computations.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes For each k = 1, , m/2: Initialize w2k 1 N(0, I), and a2k 1 = 1 with probability 1/2, and a2k 1 = -1 with probability 1/2. Initialize w2k = w2k 1 and a2k = a2k 1. All randomnesses in this initialization are independent, and are independent of the dataset. where > 0 is stepsize/learning rate. Training with f being randomly generated linear or quadratic functions with n = 1000, m = 2000.