Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units
Authors: Yixi Xu, Xiao Wang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper presents a general framework for norm-based capacity control for Lp,q weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an Lp,q normalization where q p and 1/p+1/p = 1, we discuss properties of a width-independent capacity control, which only depends on the depth by a square root term. We further analyze the approximation properties of Lp,q weight normalized deep neural networks. In particular, for an L1, weight normalized network, the approximation error can be controlled by the L1 norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth. |
| Researcher Affiliation | Academia | Yixi Xu Department of Statistics Purdue University West Lafayette, IN 47907 xu573@purdue.edu Xiao Wang Department of Statistics Purdue University West Lafayette, IN 47907 wangxiao@purdue.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any statement or link regarding open-source code for the described methodology. |
| Open Datasets | No | The paper does not mention using any datasets for training or evaluation, nor does it provide any concrete access information for a publicly available dataset. |
| Dataset Splits | No | The paper does not describe any dataset splits for training, validation, or testing, as it focuses on theoretical analysis rather than empirical experimentation. |
| Hardware Specification | No | The paper does not specify any hardware used for running experiments, as it focuses on theoretical analysis. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, as it focuses on theoretical analysis rather than implementation details. |
| Experiment Setup | No | The paper does not provide details about an experimental setup, hyperparameters, or training configurations, as it focuses on theoretical analysis. |