Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Authors: Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental These results are confirmed by numerical experiments.
Researcher Affiliation Academia Université Paris-Saclay, CNRS, CEA, Institut de physique théorique, 91191, Gif-sur-Yvette,France. Courant Institute, New York University, 251 Mercer Street, New York, New York 10012, USA. SPOC laboratory, EPFL, Switzerland.
Pseudocode No The paper contains mathematical derivations and descriptions of algorithms (like gradient descent), but it does not include any explicit pseudocode blocks or formally labeled algorithm sections.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets No The teacher produces n outputs yk = f (xk) from random i.i.d. Gaussian samples xk ν = N(0, Id), k = 1, . . . , n. - The paper describes a synthetic data generation process for its experiments rather than using or providing access to a pre-existing, publicly available dataset.
Dataset Splits No The paper generates synthetic data (random i.i.d. Gaussian samples) for its experiments. It discusses training on this data via empirical loss and evaluates generalization error (population loss) but does not provide details on explicit training, validation, or test dataset splits in terms of percentages or counts for reproducibility.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory, or specific computing environments) used to run the numerical experiments or simulations.
Software Dependencies No The paper does not specify any software dependencies, such as libraries or frameworks, along with their version numbers that would be necessary to replicate the experiments.
Experiment Setup Yes The figures show log-average of 100 simulations with d = 8 and from left to right m = 2, 4, 8, 16, respectively. And Fig. 3 shows the training and the population loss observed in the simulation using input dimension d = 8 and a teacher with m = 1 hidden unit. And In Fig. 4 we compare the strings obtained for input dimension 4 (red), 6 (purple), end 8 (blue). The strings are parametrized by 100 points represented on the horizontal axes. Moving from the leftmost to the rightmost panels in Fig. 4 the number of samples in the dataset increases, namely n = 8, 12, 16, 20.