Assessing Generalization of SGD via Disagreement

Authors: Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J Zico Kolter

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that the test error of deep networks can be estimated by training the same architecture on the same training set but with two different runs of Stochastic Gradient Descent (SGD), and then measuring the disagreement rate between the two networks on unlabeled test data.
Researcher Affiliation Collaboration Yiding Jiang Carnegie Mellon University ydjiang@cmu.edu Vaishnavh Nagarajan Google Research vaishnavh@google.com Christina Baek, J. Zico Kolter Carnegie Mellon University {kbaek,zkolter}@cs.cmu.edu
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets Yes We observe on the SVHN (Netzer et al., 2011), CIFAR-10/100 (Krizhevsky et al., 2009) datasets, and for variants of Residual Networks (He et al., 2016) and Convolutional Networks (Lin et al., 2013)...
Dataset Splits No The paper mentions training data and test data, but does not explicitly provide details about validation dataset splits or the use of a separate validation set for hyperparameter tuning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies No The paper mentions software components like 'Momentum SGD' and architectural types (ResNet, CNN, FCN), but does not specify version numbers for any software libraries or frameworks used.
Experiment Setup Yes C.1 PAIRWISE DISAGREEMENT ... 1. width multiplier: {1 , 2 } 2. initial learning rate: {0.1, 0.05} 3. weight decay: {0.0001, 0.0} 4. minibatch size: { 200, 100} 5. data augmentation: {No, Yes} ... All models are trained with SGD with momentum of 0.9. The learning rate decays 10 every 50 epochs. The training stops when the training accuracy reaches 100%.