reproducibilityindex.ai

Assessing Generalization of SGD via Disagreement

Authors: Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J Zico Kolter

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that the test error of deep networks can be estimated by training the same architecture on the same training set but with two different runs of Stochastic Gradient Descent (SGD), and then measuring the disagreement rate between the two networks on unlabeled test data.
Researcher Affiliation	Collaboration	Yiding Jiang Carnegie Mellon University ydjiang@cmu.edu Vaishnavh Nagarajan Google Research vaishnavh@google.com Christina Baek, J. Zico Kolter Carnegie Mellon University {kbaek,zkolter}@cs.cmu.edu
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We observe on the SVHN (Netzer et al., 2011), CIFAR-10/100 (Krizhevsky et al., 2009) datasets, and for variants of Residual Networks (He et al., 2016) and Convolutional Networks (Lin et al., 2013)...
Dataset Splits	No	The paper mentions training data and test data, but does not explicitly provide details about validation dataset splits or the use of a separate validation set for hyperparameter tuning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies	No	The paper mentions software components like 'Momentum SGD' and architectural types (ResNet, CNN, FCN), but does not specify version numbers for any software libraries or frameworks used.
Experiment Setup	Yes	C.1 PAIRWISE DISAGREEMENT ... 1. width multiplier: {1 , 2 } 2. initial learning rate: {0.1, 0.05} 3. weight decay: {0.0001, 0.0} 4. minibatch size: { 200, 100} 5. data augmentation: {No, Yes} ... All models are trained with SGD with momentum of 0.9. The learning rate decays 10 every 50 epochs. The training stops when the training accuracy reaches 100%.