reproducibilityindex.ai

Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training

Authors: Rie Johnson, Tong Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical study based on this analysis shows that instability and inconsistency are strongly predictive of generalization gap in various settings. Our empirical study consists of three parts.
Researcher Affiliation	Collaboration	Rie Johnson RJ Research Consulting New York, USA riejohnson@gmail.com Tong Zhang HKUST Hong Kong tozhang@tongzhang-ml.org This work was done when the second author was jointly with Google Research.
Pseudocode	Yes	Algorithm 1: Training with consistency encouragement. Algorithm 2: Our semi-supervised variant of co-distillation.
Open Source Code	No	The paper mentions using publicly available models and tools (e.g., Efficient Net-B0, SAM from github.com/google-research/sam), but it does not state that its own methodology implementation or novel code is open-sourced or provide a link to its own code repository.
Open Datasets	Yes	Table 1: Datasets. Name: Image Net [7], Food101 [3], Dogs [21], Cars [23], CIFAR-10 [24], MNLI [36], QNLI [36].
Dataset Splits	Yes	The expectation values involved in CP and SP were estimated by taking the average; in particular, the expectation over data distribution Z was estimated on the held-out unlabeled data disjoint from training data or test data. (K, J) was set to (4,8) for CIFAR-10/100 and (4,4) for Image Net, and the size of each training set was set to 4K for CIFAR-10/100 and 120K (10%) for Image Net. ... on the development data (held-out 5K data points).
Hardware Specification	Yes	All the experiments were done using GPUs (A100 or older). ... no TPU
Software Dependencies	No	The paper mentions software components like SGD, Adam W, and references models like RoBERTa-base, but it does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	Table 10: Basic settings shared by all the models for each case (Case#1 7,10; images) Training type From scratch Fine-tuning Distillation Dataset Network Batch size Epochs Update steps Warmup steps Learning rate Schedule Optimizer Weight decay Label smooth Iterate averaging Gradient clipping Data augment. Table 12: Hyperparameters for SAM. Case#: 1 2 3 4 5 6 7 8,9 10 m-sharpness ρ