reproducibilityindex.ai

Proving the Lottery Ticket Hypothesis for Convolutional Neural Networks

Authors: Arthur da Cunha, Emanuele Natale, Laurent Viennot

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also provide basic experiments showing that starting from a random CNN which is roughly 30 times larger than Le Net5, it is possible to compute in few hours a pruning mask that allows to approximate the trained convolutional layers of Le Net5 with relative error 10 3, even when ignoring some hypothesis of our theoretical result. Our theoretical analysis follows the approach of Malach et al. (2020) and make use of two layers to approximate one. We borrow from Pensia et al. (2020) the use of random subset sum (RSS) (Lueker, 1998) to approximate a given weight via the sum of a subset of a sample of random weights, and carefully design instances of RSS via a combination of two convolutional layers. 3 EXPERIMENTS As networks with higher parameter count tend to be more robust to noise, we stick to the small CNN architecture used by Pensia et al. (2020), namely, Le Net5 (Le Cun et al., 1989a) with Re LU activations. We conduct our experiments by ﬁrst training a the network to 98.99% test accuracy on MNIST dataset (Lecun et al., 1998). To avoid well-known limitations of the MNIST dataset (in particular its large number of zero entries), we also trained it on the Fashion-MNIST dataset (Xiao et al., 2017) to 89.12% test accuracy.
Researcher Affiliation	Academia	Arthur C. W. da Cunha & Emanuele Natale Inria Sophia Antipolis Sophia Antipolis, France {arthur.carvalho-walraven-da-cunha,emanuele.natale}@inria.fr Laurent Viennot Inria Paris, IRIF Paris, France laurent.viennot@inria.fr
Pseudocode	No	The paper provides detailed proofs and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	We also took care to initialize the random number generator not only for Julia but also for Gurobi, so that the experiments can be reproduced using the source code available at https://github.com/Arthur Walraven/cnnslth.
Open Datasets	Yes	We conduct our experiments by ﬁrst training a the network to 98.99% test accuracy on MNIST dataset (Lecun et al., 1998). To avoid well-known limitations of the MNIST dataset (in particular its large number of zero entries), we also trained it on the Fashion-MNIST dataset (Xiao et al., 2017) to 89.12% test accuracy.
Dataset Splits	No	The paper mentions training and testing on MNIST and Fashion-MNIST, but does not specify the train/validation/test dataset splits (e.g., percentages or counts).
Hardware Specification	Yes	Solving this subset sum problem with n = 30 for the 2572 parameters in the convolutional layers of Let Net takes around 1 hour on 32 cores of a Intel Xeon Gold 6240 CPU @ 2.60GHz.
Software Dependencies	No	We adopted Kaiming Uniform (He et al., 2015) for weight initialization, a batch size of 64 and trained for 50 epochs using ADAM optimizer (Kingma & Ba, 2015) with learning rate of 0.001, exponential decay of 0.9 and momentum estimate of 0.999, the default values in Flux.jl (Innes et al., 2018) machine learning library. We also took care to initialize the random number generator not only for Julia but also for Gurobi, so that the experiments can be reproduced using the source code available at https://github.com/Arthur Walraven/cnnslth. While software is mentioned, specific version numbers for Flux.jl or Gurobi (e.g., Gurobi 9.5.1) are not provided, only the year of their publications/manuals.
Experiment Setup	Yes	We adopted Kaiming Uniform (He et al., 2015) for weight initialization, a batch size of 64 and trained for 50 epochs using ADAM optimizer (Kingma & Ba, 2015) with learning rate of 0.001, exponential decay of 0.9 and momentum estimate of 0.999, the default values in Flux.jl (Innes et al., 2018) machine learning library.