reproducibilityindex.ai

On Lazy Training in Differentiable Programming

Authors: Lénaïc Chizat, Edouard Oyallon, Francis Bach

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments bring a critical note, as we observe that the performance of commonly used non-linear deep convolutional neural networks in computer vision degrades when trained in the lazy regime.
Researcher Affiliation	Academia	Lénaïc Chizat CNRS, Université Paris-Sud Orsay, France lenaic.chizat@u-psud.fr Edouard Oyallon Centrale Supelec, INRIA Gif-sur-Yvette, France edouard.oyallon@centralesupelec.fr Francis Bach INRIA, ENS, PSL Research University Paris, France francis.bach@inria.fr
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce these experiments is available online7. 7https://github.com/edouardoyallon/lazy-training-CNN
Open Datasets	Yes	We consider the VGG-11 model [32], which is a widely used model on CIFAR10.
Dataset Splits	No	The paper mentions 'test loss' and 'test accuracy' suggesting train/test splits for the datasets (synthetic and CIFAR-10), but it does not explicitly provide specific percentages, sample counts, or formal citations for predefined training, validation, or test splits. No 'validation' set is mentioned.
Hardware Specification	No	The paper mentions 'a GPU donation from NVIDIA' in the acknowledgments but does not specify the model of the GPU or any other specific hardware components (CPU, memory, etc.) used for the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	We trained it via mini-batch SGD with a momentum parameter of 0.9... An initial learning rate η0 is linearly decayed at each epoch, following ηt = η0 1+βt. The biases are initialized with 0 and all other weights are initialized with normal Xavier initialization [13]... The model h is trained for the square loss multiplied by 1/α^2... with standard data-augmentation, batch-size of 128 [35] and η0 = 1... The total number of epochs is 70... We choose α = 10^7... a batch-size of 8 and, after cross-validation, η0 = 0.01, 1.0... We also multiply the initial weights by respectively 1.2 and 1.3 for the Res Net-18 and VGG-11...