Disentangling Trainability and Generalization in Deep Neural Networks
Authors: Lechao Xiao, Jeffrey Pennington, Samuel Schoenholz
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These theoretical results are corroborated experimentally on CIFAR10 for a variety of network architectures and we include a colab1 notebook that reproduces the essential results of the paper. In each case, we provide empirical evidence to support our theoretical conclusions. |
| Researcher Affiliation | Industry | 1Google Research, Brain Team. Correspondence to: Lechao Xiao <xlc@google.com>, Samuel S. Schoenholz <schsam@google.com>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | we include a colab1 notebook that reproduces the essential results of the paper. 1Available at: https://tinyurl.com/ybsfxk5y. |
| Open Datasets | Yes | We conduct an experiment training finite-width CNN-F networks with 1k training samples from CIFAR-10 |
| Dataset Splits | No | The paper discusses training and test sets but does not explicitly mention a separate validation set or provide details about its split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using 'Neural Tangents library (Novak et al., 2019a)' but does not specify its version number, nor does it list versions for other key software components. |
| Experiment Setup | Yes | We train each network using SGD with batch size b = 256 and learning rate = 0.1 theory. We see in Figure 2 (a) that deep in the chaotic phase we see that all configurations reach perfect training accuracy, but the network completely fails to generalize in the sense test accuracy is around 10%. As expected, in the ordered phase we see that although the training accuracy degrades generalization improves. As expected we see that the depth-scales 1 and control trainability in the ordered phase and generalization in the chaotic phase respectively. We also conduct extra experiments for FCN with more training points (16k); see Figure 6. |