reproducibilityindex.ai

Greedy Layerwise Learning Can Scale To ImageNet

Authors: Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we study CNNs on image classiﬁcation tasks using the large-scale Image Net dataset and the CIFAR-10 dataset. Using a simple set of ideas for architecture and training we ﬁnd that solving sequential 1-hidden-layer auxiliary problems lead to a CNN that exceeds Alex Net performance on Image Net.
Researcher Affiliation	Academia	1Mila, University of Montreal 2University of California, Berkeley 3Centrale Supelec, University of Paris-Saclay / INRIA Saclay.
Pseudocode	Yes	Algorithm 1 Layer Wise CNN
Open Source Code	No	No explicit statement about providing access to source code or a link to a code repository was found in the paper.
Open Datasets	Yes	We performed experiments on the large-scale Image Net-1k (Russakovsky et al., 2015), a major catalyst for the recent popularity of deep learning, as well as the CIFAR-10 dataset. CIFAR-10 consists of small RGB images with respectively 50k and 10k samples for training and testing.
Dataset Splits	Yes	CIFAR-10 consists of small RGB images with respectively 50k and 10k samples for training and testing. Image Net consists of 1.2M RGB images of varying size for training. Our final trained model achieves 79.7% top-5 single crop accuracy on the validation set
Hardware Specification	No	Only general hardware information ("We use 4 GPUs to train our Image Net models.") is provided, lacking specific model numbers, processor types, or memory details.
Software Dependencies	No	The paper mentions optimization algorithms (SGD) and data augmentation techniques but does not specify any software names with version numbers for reproducibility.
Experiment Setup	Yes	We use the standard data augmentation and optimize each layer with SGD using a momentum of 0.9 and a batch-size of 128. The initial learning rate is 0.1 and we use the reduced schedule with decays of 0.2 every 15 epochs (Zagoruyko & Komodakis, 2016), for a total of 50 epochs in each layer. We used SGD with momentum 0.9, weight-decay of 10-4 for a batch size of 256. The initial learning rate is 0.1 (He et al., 2016) and we use the reduced schedule with decays of 0.1 every 20 epochs for 45 epochs.