Annihilation of Spurious Minima in Two-Layer ReLU Networks
Authors: Yossi Arjevani, Michael Field
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the optimization problem associated with fitting two-layer Re LU neural networks with respect to the squared loss, where labels are generated by a target network. Use is made of the rich symmetry structure to develop a novel set of tools for studying the mechanism by which over-parameterization annihilates spurious minima. Sharp analytic estimates are obtained for the loss and the Hessian spectrum at different minima, and it is proved that adding neurons can turn symmetric spurious minima into saddles; minima of lesser symmetry require more neurons. Using Cauchy s interlacing theorem, we prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function. This analytic approach uses techniques, new to the field, from algebraic geometry, representation theory and symmetry breaking, and confirms rigorously the effectiveness of over-parameterization in making the associated loss landscape accessible to gradient-based methods. |
| Researcher Affiliation | Academia | Yossi Arjevani The Hebrew University yossi.arjevani@gmail.com Michael Field UC Santa Barbara Mike.field@gmail.com |
| Pseudocode | No | The paper focuses on theoretical analysis and mathematical proofs, and it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that source code for the described methodology is publicly available. The ethics statement indicates N/A for code. |
| Open Datasets | No | The paper is theoretical and does not involve training models on datasets. It refers to theoretical data assumptions (e.g., Gaussian distribution) or data used in other works, but not data used by the authors for their own experiments. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments involving data splits. The ethics statement indicates N/A for training details. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments requiring specific hardware. The ethics statement indicates N/A for compute resources. |
| Software Dependencies | No | The paper is theoretical and does not report on experiments requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations. The ethics statement indicates N/A for training details. |