On the training dynamics of deep networks with $L_2$ regularization

Authors: Aitor Lewkowycz, Guy Gur-Ari

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. We now turn to an empirical study of networks trained with L2 regularization. In this section we present results for a fully-connected network trained on MNIST, a Wide Res Net [Zagoruyko and Komodakis, 2016] trained on CIFAR-10, and CNNs trained on CIFAR-10.
Researcher Affiliation Industry Aitor Lewkowycz Google Mountain View, CA alewkowycz@google.com Guy Gur-Ari Google Mountain View, CA guyga@google.com
Pseudocode No The paper describes methods and theoretical derivations but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets Yes In this section we present results for a fully-connected network trained on MNIST, a Wide Res Net [Zagoruyko and Komodakis, 2016] trained on CIFAR-10, and CNNs trained on CIFAR-10.
Dataset Splits No The paper mentions evaluating on datasets like MNIST and CIFAR-10 and refers to 'Test accuracy' but does not explicitly provide specific train/validation/test dataset split percentages, sample counts, or detailed splitting methodology in the main text.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or TPU versions) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch 1.9 or specific programming language versions like Python 3.8 tied to experimental setup).
Experiment Setup Yes Figure 1: Wide Res Net 28-10 trained on CIFAR-10 with momentum and data augmentation. a Wide Res Net trained on CIFAR-10 with momentum= 0.9 , learning rate = 0.2 and data augmentation.