Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions
Authors: Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These results are confirmed by numerical experiments. |
| Researcher Affiliation | Academia | Université Paris-Saclay, CNRS, CEA, Institut de physique théorique, 91191, Gif-sur-Yvette,France. Courant Institute, New York University, 251 Mercer Street, New York, New York 10012, USA. SPOC laboratory, EPFL, Switzerland. |
| Pseudocode | No | The paper contains mathematical derivations and descriptions of algorithms (like gradient descent), but it does not include any explicit pseudocode blocks or formally labeled algorithm sections. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The teacher produces n outputs yk = f (xk) from random i.i.d. Gaussian samples xk ν = N(0, Id), k = 1, . . . , n. - The paper describes a synthetic data generation process for its experiments rather than using or providing access to a pre-existing, publicly available dataset. |
| Dataset Splits | No | The paper generates synthetic data (random i.i.d. Gaussian samples) for its experiments. It discusses training on this data via empirical loss and evaluates generalization error (population loss) but does not provide details on explicit training, validation, or test dataset splits in terms of percentages or counts for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory, or specific computing environments) used to run the numerical experiments or simulations. |
| Software Dependencies | No | The paper does not specify any software dependencies, such as libraries or frameworks, along with their version numbers that would be necessary to replicate the experiments. |
| Experiment Setup | Yes | The figures show log-average of 100 simulations with d = 8 and from left to right m = 2, 4, 8, 16, respectively. And Fig. 3 shows the training and the population loss observed in the simulation using input dimension d = 8 and a teacher with m = 1 hidden unit. And In Fig. 4 we compare the strings obtained for input dimension 4 (red), 6 (purple), end 8 (blue). The strings are parametrized by 100 points represented on the horizontal axes. Moving from the leftmost to the rightmost panels in Fig. 4 the number of samples in the dataset increases, namely n = 8, 12, 16, 20. |