Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

Authors: Colin Wei, Tengyu Ma

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a complement to the main theoretical results in this paper, we show empirically in Section 6 that directly regularizing our complexity measure can result in improved test performance. We provide preliminary experiments demonstrating that the proposed complexity measure and generalization bounds are empirically relevant. We show that regularizing the complexity measure leads to better test accuracy.
Researcher Affiliation Academia Colin Wei Computer Science Department Stanford University colinwei@stanford.edu Tengyu Ma Computer Science Department Stanford University tengyuma@stanford.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code of the methodology described.
Open Datasets Yes Figure 1 shows the results for models trained and tested on CIFAR10 in low learning rate and no data augmentation settings, which are settings where generalization typically suffers. Table 1: Test error for a model trained on CIFAR10 in various settings.
Dataset Splits No The paper mentions training and testing on CIFAR10 but does not specify the exact train/validation/test splits, percentages, or methodology for splitting the dataset.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments are provided in the paper.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The threshold on the Frobenius norm in the regularization is inspired by the truncations in our augmented loss (in all our experiments, we choose σ = 0.1). We tune the coefficient λ as a hyperparameter. In our experiments, we took the regularized indices i to be last layers in each residual block as well as layers in residual blocks following a Batch Norm in the standard Wide Res Net16 architecture. In the Layer Norm setting, we simply replaced Batch Norm layers with Layer Norm. The remaining hyperparameter settings are standard for Wide Res Net; for additional details see Section I.1.