Empirical Risk Landscape Analysis for Understanding Deep Neural Networks

Authors: Pan Zhou, Jiashi Feng

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work aims to provide comprehensive landscape analysis of empirical risk in deep neural networks (DNNs), including the convergence behavior of its gradient, its stationary points and the empirical risk itself to their corresponding population counterparts, which reveals how various network parameters determine the convergence performance. In particular, for an l-layer linear neural network consisting of di neurons in the i-th layer, we prove the gradient of its empirical risk uniformly converges to the one of its population risk, at the rate of O(r2lp l maxi dis log(d/l)/n). Moreover, we prove the one-to-one correspondence of the non-degenerate stationary points between the empirical and population risks and provide convergence guarantee for each pair. We also establish the uniform convergence of the empirical risk to its population counterpart and further derive the stability and generalization bounds for the empirical risk. In addition, we analyze these properties for deep nonlinear neural networks with sigmoid activation functions. We prove similar results for convergence behavior of their empirical risk gradients, non-degenerate stationary points as well as the empirical risk itself. To our best knowledge, this work is the first one theoretically characterizing the uniform convergence of the gradient and stationary points of the empirical risk of DNN models, which benefits the theoretical understanding on how the neural network depth l, the layer width di, the network size d, the sparsity in weight and the parameter magnitude r determine the neural network landscape.
Researcher Affiliation Academia Pan Zhou & Jiashi Feng Department of Electrical and Computer Engineering National University of Singapore Singapore, 117583 {pzhou@u.nus.edu, elefjia@nus.edu.sg}
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the release of source code.
Open Datasets No The paper is theoretical and does not conduct experiments on a specific dataset. It makes assumptions about input data, such as 'The input datum x Rd0 has zero mean and is τ 2-sub-Gaussian' (Assumption 1) or 'The input sample x obeys Assumption 2 and the activation functions in a deep neural network are sigmoid functions' (Section 5).
Dataset Splits No The paper is theoretical and does not involve empirical experiments with dataset splits.
Hardware Specification No The paper is theoretical and does not report on experimental hardware specifications.
Software Dependencies No The paper is theoretical and does not list any specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and does not include details on experimental setup, hyperparameters, or training configurations.