On the training dynamics of deep networks with $L_2$ regularization
Authors: Aitor Lewkowycz, Guy Gur-Ari
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. We now turn to an empirical study of networks trained with L2 regularization. In this section we present results for a fully-connected network trained on MNIST, a Wide Res Net [Zagoruyko and Komodakis, 2016] trained on CIFAR-10, and CNNs trained on CIFAR-10. |
| Researcher Affiliation | Industry | Aitor Lewkowycz Google Mountain View, CA alewkowycz@google.com Guy Gur-Ari Google Mountain View, CA guyga@google.com |
| Pseudocode | No | The paper describes methods and theoretical derivations but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | In this section we present results for a fully-connected network trained on MNIST, a Wide Res Net [Zagoruyko and Komodakis, 2016] trained on CIFAR-10, and CNNs trained on CIFAR-10. |
| Dataset Splits | No | The paper mentions evaluating on datasets like MNIST and CIFAR-10 and refers to 'Test accuracy' but does not explicitly provide specific train/validation/test dataset split percentages, sample counts, or detailed splitting methodology in the main text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or TPU versions) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch 1.9 or specific programming language versions like Python 3.8 tied to experimental setup). |
| Experiment Setup | Yes | Figure 1: Wide Res Net 28-10 trained on CIFAR-10 with momentum and data augmentation. a Wide Res Net trained on CIFAR-10 with momentum= 0.9 , learning rate = 0.2 and data augmentation. |