On the Expressive Power of Deep Neural Networks

Authors: Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate this phenomenon through experiments on MNIST and CIFAR-10, where the network displays much less robustness to noise in the lower layers, and better performance when they are trained well. We also explore the effects of regularization methods on trajectory length as the network trains and propose a less computationally intensive method of regularization, trajectory regularization, that offers the same performance as batch normalization.
Researcher Affiliation Collaboration 1Cornell University 2Google Brain 3Stanford University.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement about the availability of source code or a link to a code repository.
Open Datasets Yes We demonstrate this phenomenon through experiments on MNIST and CIFAR-10
Dataset Splits No The paper mentions 'train and test accuracy' and training on specific datasets (MNIST, CIFAR-10), but does not explicitly provide specific dataset split information (percentages, sample counts, or predefined splits) for training, validation, or testing.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or computer specifications) used for running experiments are provided in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names with versions) are mentioned in the paper.
Experiment Setup Yes The networks had width k = 100, weight variance σ2 w = 1, and hard-tanh nonlinearities. Two networks were initialized with σ2 w = 2 and trained to high test accuracy on CIFAR10 and MNIST. In implementation, we typically scale the additional loss above with a constant (0.01) to reduce magnitude in comparison to classification loss.