Predicting Deep Neural Network Generalization with Perturbation Response Curves
Authors: Yair Schiff, Brian Quanz, Payel Das, Pin-Yu Chen
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we propose a new framework for evaluating the generalization capabilities of trained networks. We use perturbation response (PR) curves that capture the accuracy change of a given network as a function of varying levels of training sample perturbation. From these PR curves, we derive novel statistics that capture generalization capability. Specifically, we introduce two new measures for accurately predicting generalization gaps: the Gi-score and Pal-score... Using our framework applied to intra and inter-class sample mixup, we attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the PGDL competition. |
| Researcher Affiliation | Industry | Yair Schiff1, Brian Quanz2, Payel Das2, Pin-Yu Chen2 1IBM Watson, 2IBM Research {yair.schiff@,blquanz@us.,daspa@us.,pin-yu.chen@}ibm.com |
| Pseudocode | Yes | This methodology is summarized in Algorithm 1 in Appendix A.3. ... We summarize this in Algorithm 2 in Appendix A.4. ... We give the pseudocode for the Pal-score in Algorithm 4 in Appendix A.6. |
| Open Source Code | Yes | We use the trained networks and their configurations, training data, and starting kit code from the competition; all open-sourced and provided under Apache 2.0 license1. The code includes utilities for loading models and model details and running scoring. To this base repository, we added our methods for performing different perturbations at different layers, computing PR curves, and computing our proposed Gi and Pal-scores. 1https://github.com/google-research/google-research/tree/master/pgdl |
| Open Datasets | Yes | The datasets are comprised of CIFAR-10 [28], SVHN [23], CINIC-10 [29], Oxford Flowers [30], Oxford Pets [31], and Fashion MNIST [32]. |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits, but rather refers to evaluating on 'a sample of the training data' for generating PR curves and on 'test set' for generalization. |
| Hardware Specification | Yes | Each run is performed with 4 CPUs, 4 GB RAM, and 1 V100 GPU and batch size 128, submitted as resource-restricted jobs to a cluster. |
| Software Dependencies | No | The paper mentions using PyTorch and PyTorch Lightning, and references TensorFlow, but does not provide specific version numbers for these software dependencies in the context of their experiments. |
| Experiment Setup | Yes | For all models, we train with batch sizes of either 1024, 2048, or 4096 and learning rates of either 1e 4 or 1e 5. All models are trained with Adam optimization and a learning rate scheduler that reduced learning rate on training loss plateaus. |