Understanding the Loss Surface of Neural Networks for Binary Classification
Authors: SHIYU LIANG, Ruoyu Sun, Yixuan Li, Rayadurgam Srikant
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | On the positive side, we prove that no bad local minima exist under the following conditions: the neurons (i.e. activation functions) are increasing and strictly convex, the neural network is single-layered or is multi-layered with a shortcutlike connection, the surrogate loss function is a smooth version of the hinge loss function, and either the dataset is linearly separable or the positively and negatively labeled samples are located on different subspaces. On the negative side, we provide dozens of counterexamples which show that bad local minima exist when these conditions do not hold. |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign, 2Facebook Research. Correspondence to: R. Srikant <rsrikant@illinois.edu>, Ruoyu Sun <ruoyus@illinois.edu>. |
| Pseudocode | No | The paper contains mathematical derivations, lemmas, and theorems, but it does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper is theoretical and presents proofs and counterexamples for properties of neural network loss surfaces. It does not describe a method or system for which source code would typically be provided or made available. |
| Open Datasets | No | The paper is theoretical and analyzes properties of neural networks under certain data distributions, some of which are abstract (e.g., 'PX|Y = 1 is a uniform distribution on the interval [0, 1]'). It does not use or provide access information for any real-world publicly available datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical proofs and counterexamples regarding loss surface properties. It does not conduct empirical experiments that would involve dataset splitting (e.g., train/validation/test splits). |
| Hardware Specification | No | The paper is purely theoretical, focusing on mathematical analysis of neural network loss surfaces. It does not describe any computational experiments that would require specific hardware, and thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is a theoretical work involving mathematical proofs and counterexamples. It does not describe any computational experiments or require specific software dependencies with version numbers for replication. |
| Experiment Setup | No | The paper is a theoretical study of neural network loss surfaces. It does not describe empirical experiments, and therefore no experimental setup details such as hyperparameters or training configurations are provided. |