On the Expressive Power of Deep Neural Networks
Authors: Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this phenomenon through experiments on MNIST and CIFAR-10, where the network displays much less robustness to noise in the lower layers, and better performance when they are trained well. We also explore the effects of regularization methods on trajectory length as the network trains and propose a less computationally intensive method of regularization, trajectory regularization, that offers the same performance as batch normalization. |
| Researcher Affiliation | Collaboration | 1Cornell University 2Google Brain 3Stanford University. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement about the availability of source code or a link to a code repository. |
| Open Datasets | Yes | We demonstrate this phenomenon through experiments on MNIST and CIFAR-10 |
| Dataset Splits | No | The paper mentions 'train and test accuracy' and training on specific datasets (MNIST, CIFAR-10), but does not explicitly provide specific dataset split information (percentages, sample counts, or predefined splits) for training, validation, or testing. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or computer specifications) used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names with versions) are mentioned in the paper. |
| Experiment Setup | Yes | The networks had width k = 100, weight variance σ2 w = 1, and hard-tanh nonlinearities. Two networks were initialized with σ2 w = 2 and trained to high test accuracy on CIFAR10 and MNIST. In implementation, we typically scale the additional loss above with a constant (0.01) to reduce magnitude in comparison to classification loss. |