Assessing Generalization of SGD via Disagreement
Authors: Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J Zico Kolter
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that the test error of deep networks can be estimated by training the same architecture on the same training set but with two different runs of Stochastic Gradient Descent (SGD), and then measuring the disagreement rate between the two networks on unlabeled test data. |
| Researcher Affiliation | Collaboration | Yiding Jiang Carnegie Mellon University ydjiang@cmu.edu Vaishnavh Nagarajan Google Research vaishnavh@google.com Christina Baek, J. Zico Kolter Carnegie Mellon University {kbaek,zkolter}@cs.cmu.edu |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We observe on the SVHN (Netzer et al., 2011), CIFAR-10/100 (Krizhevsky et al., 2009) datasets, and for variants of Residual Networks (He et al., 2016) and Convolutional Networks (Lin et al., 2013)... |
| Dataset Splits | No | The paper mentions training data and test data, but does not explicitly provide details about validation dataset splits or the use of a separate validation set for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Momentum SGD' and architectural types (ResNet, CNN, FCN), but does not specify version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | C.1 PAIRWISE DISAGREEMENT ... 1. width multiplier: {1 , 2 } 2. initial learning rate: {0.1, 0.05} 3. weight decay: {0.0001, 0.0} 4. minibatch size: { 200, 100} 5. data augmentation: {No, Yes} ... All models are trained with SGD with momentum of 0.9. The learning rate decays 10 every 50 epochs. The training stops when the training accuracy reaches 100%. |