Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training
Authors: Rie Johnson, Tong Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical study based on this analysis shows that instability and inconsistency are strongly predictive of generalization gap in various settings. Our empirical study consists of three parts. |
| Researcher Affiliation | Collaboration | Rie Johnson RJ Research Consulting New York, USA riejohnson@gmail.com Tong Zhang HKUST Hong Kong tozhang@tongzhang-ml.org This work was done when the second author was jointly with Google Research. |
| Pseudocode | Yes | Algorithm 1: Training with consistency encouragement. Algorithm 2: Our semi-supervised variant of co-distillation. |
| Open Source Code | No | The paper mentions using publicly available models and tools (e.g., Efficient Net-B0, SAM from github.com/google-research/sam), but it does not state that its own methodology implementation or novel code is open-sourced or provide a link to its own code repository. |
| Open Datasets | Yes | Table 1: Datasets. Name: Image Net [7], Food101 [3], Dogs [21], Cars [23], CIFAR-10 [24], MNLI [36], QNLI [36]. |
| Dataset Splits | Yes | The expectation values involved in CP and SP were estimated by taking the average; in particular, the expectation over data distribution Z was estimated on the held-out unlabeled data disjoint from training data or test data. (K, J) was set to (4,8) for CIFAR-10/100 and (4,4) for Image Net, and the size of each training set was set to 4K for CIFAR-10/100 and 120K (10%) for Image Net. ... on the development data (held-out 5K data points). |
| Hardware Specification | Yes | All the experiments were done using GPUs (A100 or older). ... no TPU |
| Software Dependencies | No | The paper mentions software components like SGD, Adam W, and references models like RoBERTa-base, but it does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Table 10: Basic settings shared by all the models for each case (Case#1 7,10; images) Training type From scratch Fine-tuning Distillation Dataset Network Batch size Epochs Update steps Warmup steps Learning rate Schedule Optimizer Weight decay Label smooth Iterate averaging Gradient clipping Data augment. Table 12: Hyperparameters for SAM. Case#: 1 2 3 4 5 6 7 8,9 10 m-sharpness ρ |