Disentangling the Mechanisms Behind Implicit Regularization in SGD
Authors: Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary Chase Lipton
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. |
| Researcher Affiliation | Academia | Zachary Novack UC San Diego znovack@ucsd.edu Simran Kaur Princeton University skaur@princeton.edu Tanya Marwah Carnegie Mellon University tmarwah@andrew.cmu.edu Saurabh Garg Carnegie Mellon University sgarg2@andrew.cmu.edu Zachary Lipton Carnegie Mellon University zlipton@andrew.cmu.edu |
| Pseudocode | No | No pseudocode or algorithm block is explicitly presented in the paper. |
| Open Source Code | Yes | The source code for reproducing the work presented here, including all hyperparameters and random seeds, is available at https://github.com/acmi-lab/imp-regularizers. Additional experimental details are available in Appendix A.5. |
| Open Datasets | Yes | on the CIFAR10, CIFAR100, Tiny-Image Net, and SVHN image classification benchmarks (Krizhevsky, 2009; Le and Yang, 2015; Netzer et al., 2011). |
| Dataset Splits | No | Figure 1: Validation Accuracy and Average Micro-batch (|M| = 128) Gradient Norm for CIFAR10/100 Regularization Experiments, averaged across runs (plots also smoothed for clarity). |
| Hardware Specification | Yes | All experiments were run on a single RTX A6000 NVidia GPU. |
| Software Dependencies | No | All experiments run for the present paper were performed using the Pytorch deep learning API, and source code can be found here: https://github.com/anon2023ICLR/ imp-regularizers. |
| Experiment Setup | Yes | Additional experimental details can be found in Appendix A.5. Values for our hyperparameters in our main experiments are detailed below: Table 8: Learning rate (η) used in main experiments... Table 9: Regularization strength (λ) used in main experiments... All experiments were run for 50000 update iterations. No weight decay or momentum was used in any of the experiments. |