reproducibilityindex.ai

Disentangling the Mechanisms Behind Implicit Regularization in SGD

Authors: Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary Chase Lipton

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap.
Researcher Affiliation	Academia	Zachary Novack UC San Diego znovack@ucsd.edu Simran Kaur Princeton University skaur@princeton.edu Tanya Marwah Carnegie Mellon University tmarwah@andrew.cmu.edu Saurabh Garg Carnegie Mellon University sgarg2@andrew.cmu.edu Zachary Lipton Carnegie Mellon University zlipton@andrew.cmu.edu
Pseudocode	No	No pseudocode or algorithm block is explicitly presented in the paper.
Open Source Code	Yes	The source code for reproducing the work presented here, including all hyperparameters and random seeds, is available at https://github.com/acmi-lab/imp-regularizers. Additional experimental details are available in Appendix A.5.
Open Datasets	Yes	on the CIFAR10, CIFAR100, Tiny-Image Net, and SVHN image classification benchmarks (Krizhevsky, 2009; Le and Yang, 2015; Netzer et al., 2011).
Dataset Splits	No	Figure 1: Validation Accuracy and Average Micro-batch (\|M\| = 128) Gradient Norm for CIFAR10/100 Regularization Experiments, averaged across runs (plots also smoothed for clarity).
Hardware Specification	Yes	All experiments were run on a single RTX A6000 NVidia GPU.
Software Dependencies	No	All experiments run for the present paper were performed using the Pytorch deep learning API, and source code can be found here: https://github.com/anon2023ICLR/ imp-regularizers.
Experiment Setup	Yes	Additional experimental details can be found in Appendix A.5. Values for our hyperparameters in our main experiments are detailed below: Table 8: Learning rate (η) used in main experiments... Table 9: Regularization strength (λ) used in main experiments... All experiments were run for 50000 update iterations. No weight decay or momentum was used in any of the experiments.