reproducibilityindex.ai

Obtaining Adjustable Regularization for Free via Iterate Averaging

Authors: Jingfeng Wu, Vladimir Braverman, Lin Yang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies on both synthetic and real datasets verify our theory. Moreover, we test iterate averaging with modern deep neural networks on CIFAR-10 and CIFAR-100 datasets, and the proposed approaches still obtain effective and adjustable regularization effects with little additional computation, demonstrating the broad applicability of our methods.
Researcher Affiliation	Academia	1Johns Hopkins University, Baltimore, MD, USA 2University of California, Los Angeles, CA, USA.
Pseudocode	No	The paper describes algorithms and update rules using mathematical equations (e.g., equations 1, 2, 3, 4, 5, 6), but it does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/ uuujf/Iter Avg.
Open Datasets	Yes	We then present experiments on the MNIST dataset. [...] We train VGG-16 (Simonyan & Zisserman, 2014) and Res Net-18 (He et al., 2016) on CIFAR-10 and CIFAR-100 datasets...
Dataset Splits	No	The paper mentions training on CIFAR-10 and CIFAR-100 and testing on MNIST, but it does not explicitly provide details about validation dataset splits, such as percentages or specific sample counts for training, validation, and testing.
Hardware Specification	Yes	The running times are measured by performing the experiments using a single GPU K80.
Software Dependencies	No	The paper mentions the use of specific models like VGG-16 and ResNet-18, and standard tricks like batch normalization, but it does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., PyTorch version, TensorFlow version, CUDA version).
Experiment Setup	Yes	The models are trained for 300 epochs using SGD. We perform epoch averaging using the 240 checkpoints saved from the 61st to the 300th epoch. The ﬁrst 60 epochs are skipped since the models in the early phase are extremely unstable. After averaging the parameters, we apply a trick proposed by Izmailov et al. (2018) to handle the batch normalization statistics which are not trained by SGD. Speciﬁcally, we make a forward pass on the training data to compute the activation statistics for the batch normalization layers. For the choice of averaging scheme, we test standard geometric distribution with success probability p {0.9999, 0.999, 0.99, 0.9}.