On Lazy Training in Differentiable Programming
Authors: Lénaïc Chizat, Edouard Oyallon, Francis Bach
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments bring a critical note, as we observe that the performance of commonly used non-linear deep convolutional neural networks in computer vision degrades when trained in the lazy regime. |
| Researcher Affiliation | Academia | Lénaïc Chizat CNRS, Université Paris-Sud Orsay, France lenaic.chizat@u-psud.fr Edouard Oyallon Centrale Supelec, INRIA Gif-sur-Yvette, France edouard.oyallon@centralesupelec.fr Francis Bach INRIA, ENS, PSL Research University Paris, France francis.bach@inria.fr |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce these experiments is available online7. 7https://github.com/edouardoyallon/lazy-training-CNN |
| Open Datasets | Yes | We consider the VGG-11 model [32], which is a widely used model on CIFAR10. |
| Dataset Splits | No | The paper mentions 'test loss' and 'test accuracy' suggesting train/test splits for the datasets (synthetic and CIFAR-10), but it does not explicitly provide specific percentages, sample counts, or formal citations for predefined training, validation, or test splits. No 'validation' set is mentioned. |
| Hardware Specification | No | The paper mentions 'a GPU donation from NVIDIA' in the acknowledgments but does not specify the model of the GPU or any other specific hardware components (CPU, memory, etc.) used for the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | We trained it via mini-batch SGD with a momentum parameter of 0.9... An initial learning rate η0 is linearly decayed at each epoch, following ηt = η0 1+βt. The biases are initialized with 0 and all other weights are initialized with normal Xavier initialization [13]... The model h is trained for the square loss multiplied by 1/α^2... with standard data-augmentation, batch-size of 128 [35] and η0 = 1... The total number of epochs is 70... We choose α = 10^7... a batch-size of 8 and, after cross-validation, η0 = 0.01, 1.0... We also multiply the initial weights by respectively 1.2 and 1.3 for the Res Net-18 and VGG-11... |