Disentangling Trainability and Generalization in Deep Neural Networks

Authors: Lechao Xiao, Jeffrey Pennington, Samuel Schoenholz

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental These theoretical results are corroborated experimentally on CIFAR10 for a variety of network architectures and we include a colab1 notebook that reproduces the essential results of the paper. In each case, we provide empirical evidence to support our theoretical conclusions.
Researcher Affiliation Industry 1Google Research, Brain Team. Correspondence to: Lechao Xiao <xlc@google.com>, Samuel S. Schoenholz <schsam@google.com>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes we include a colab1 notebook that reproduces the essential results of the paper. 1Available at: https://tinyurl.com/ybsfxk5y.
Open Datasets Yes We conduct an experiment training finite-width CNN-F networks with 1k training samples from CIFAR-10
Dataset Splits No The paper discusses training and test sets but does not explicitly mention a separate validation set or provide details about its split.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory.
Software Dependencies No The paper mentions using 'Neural Tangents library (Novak et al., 2019a)' but does not specify its version number, nor does it list versions for other key software components.
Experiment Setup Yes We train each network using SGD with batch size b = 256 and learning rate = 0.1 theory. We see in Figure 2 (a) that deep in the chaotic phase we see that all configurations reach perfect training accuracy, but the network completely fails to generalize in the sense test accuracy is around 10%. As expected, in the ordered phase we see that although the training accuracy degrades generalization improves. As expected we see that the depth-scales 1 and control trainability in the ordered phase and generalization in the chaotic phase respectively. We also conduct extra experiments for FCN with more training points (16k); see Figure 6.