Disentangling the Mechanisms Behind Implicit Regularization in SGD

Authors: Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary Chase Lipton

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap.
Researcher Affiliation Academia Zachary Novack UC San Diego znovack@ucsd.edu Simran Kaur Princeton University skaur@princeton.edu Tanya Marwah Carnegie Mellon University tmarwah@andrew.cmu.edu Saurabh Garg Carnegie Mellon University sgarg2@andrew.cmu.edu Zachary Lipton Carnegie Mellon University zlipton@andrew.cmu.edu
Pseudocode No No pseudocode or algorithm block is explicitly presented in the paper.
Open Source Code Yes The source code for reproducing the work presented here, including all hyperparameters and random seeds, is available at https://github.com/acmi-lab/imp-regularizers. Additional experimental details are available in Appendix A.5.
Open Datasets Yes on the CIFAR10, CIFAR100, Tiny-Image Net, and SVHN image classification benchmarks (Krizhevsky, 2009; Le and Yang, 2015; Netzer et al., 2011).
Dataset Splits No Figure 1: Validation Accuracy and Average Micro-batch (|M| = 128) Gradient Norm for CIFAR10/100 Regularization Experiments, averaged across runs (plots also smoothed for clarity).
Hardware Specification Yes All experiments were run on a single RTX A6000 NVidia GPU.
Software Dependencies No All experiments run for the present paper were performed using the Pytorch deep learning API, and source code can be found here: https://github.com/anon2023ICLR/ imp-regularizers.
Experiment Setup Yes Additional experimental details can be found in Appendix A.5. Values for our hyperparameters in our main experiments are detailed below: Table 8: Learning rate (η) used in main experiments... Table 9: Regularization strength (λ) used in main experiments... All experiments were run for 50000 update iterations. No weight decay or momentum was used in any of the experiments.