reproducibilityindex.ai

Characterizing Structural Regularities of Labeled Data in Overparameterized Models

Authors: Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, Michael C Mozer

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identiﬁes out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end. We estimate the C-scores with a series of approximations and apply the measure to analyze the structural regularities of the MNIST, CIFAR-10, CIFAR-100, and Image Net training sets.
Researcher Affiliation	Collaboration	1Paul G. Allen School of Computer Science, University of Washington, Seattle, WA, USA. 2Octo ML.ai, Seattle, WA, USA. 3Work done while interning at Google. 4Google Research, Brain Team, Mountain View, CA, USA. 5Presently at Apple Inc., Cupertino, CA, USA. 6Department of Computer Science, University of Colorado Boulder, Boulder, CO, USA.
Pseudocode	Yes	See Algorithm 1 in the Appendix.
Open Source Code	Yes	To facilitate future research, we have released the pre-computed C-scores at (URL anonymized). Model checkpoints, code, and extra visualizations are available too. We provide code implementing our C-score estimation algorithms, and pre-computed C-scores and associated model checkpoints for CIFAR-10, CIFAR-100 and Image Net (downloadable from https://pluskid.github.io/structural-regularity/).
Open Datasets	Yes	We apply the C-score estimate to analyze several common image data sets: MNIST (Le Cun et al., 1998), CIFAR10 / CIFAR-100 (Krizhevsky, 2009), and Image Net (Russakovsky et al., 2015). For CIFAR-10 and CIFAR-100, the exported ﬁle contains two arrays labels and scores. Both arrays are stored in the order of training examples as deﬁned by the original data sets found at https://www.cs.toronto.edu/~kriz/cifar.html. In Figure 9a, we show the performance of models trained on the SVHN (Netzer et al., 2011) training set.
Dataset Splits	Yes	In particular, we sample n dynamically according to the subset ratio s 2 {10%, . . . , 90%} of the full available training set. For each s, 2000 models are trained and held-out examples are evaluated. We train 2,000 Res Net-50 models each with a random 70% subset of the Image Net training set, and estimate the C-score based on those models.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions TensorFlow in its references but does not provide specific version numbers for TensorFlow or any other software libraries used.
Experiment Setup	Yes	See the supplementary materials for details on architectures and hyperparameters. In particular, we train 2,000 Res Net-50 models each with a random 70% subset of the Image Net training set. The left panel shows SGD training with a stagewise constant learning rate, and the right panel shows the Adam optimizer (Kingma & Ba, 2015), which scales the learning rate adaptively.