Proving the Lottery Ticket Hypothesis for Convolutional Neural Networks

Authors: Arthur da Cunha, Emanuele Natale, Laurent Viennot

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide basic experiments showing that starting from a random CNN which is roughly 30 times larger than Le Net5, it is possible to compute in few hours a pruning mask that allows to approximate the trained convolutional layers of Le Net5 with relative error 10 3, even when ignoring some hypothesis of our theoretical result. Our theoretical analysis follows the approach of Malach et al. (2020) and make use of two layers to approximate one. We borrow from Pensia et al. (2020) the use of random subset sum (RSS) (Lueker, 1998) to approximate a given weight via the sum of a subset of a sample of random weights, and carefully design instances of RSS via a combination of two convolutional layers. 3 EXPERIMENTS As networks with higher parameter count tend to be more robust to noise, we stick to the small CNN architecture used by Pensia et al. (2020), namely, Le Net5 (Le Cun et al., 1989a) with Re LU activations. We conduct our experiments by first training a the network to 98.99% test accuracy on MNIST dataset (Lecun et al., 1998). To avoid well-known limitations of the MNIST dataset (in particular its large number of zero entries), we also trained it on the Fashion-MNIST dataset (Xiao et al., 2017) to 89.12% test accuracy.
Researcher Affiliation Academia Arthur C. W. da Cunha & Emanuele Natale Inria Sophia Antipolis Sophia Antipolis, France {arthur.carvalho-walraven-da-cunha,emanuele.natale}@inria.fr Laurent Viennot Inria Paris, IRIF Paris, France laurent.viennot@inria.fr
Pseudocode No The paper provides detailed proofs and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes We also took care to initialize the random number generator not only for Julia but also for Gurobi, so that the experiments can be reproduced using the source code available at https://github.com/Arthur Walraven/cnnslth.
Open Datasets Yes We conduct our experiments by first training a the network to 98.99% test accuracy on MNIST dataset (Lecun et al., 1998). To avoid well-known limitations of the MNIST dataset (in particular its large number of zero entries), we also trained it on the Fashion-MNIST dataset (Xiao et al., 2017) to 89.12% test accuracy.
Dataset Splits No The paper mentions training and testing on MNIST and Fashion-MNIST, but does not specify the train/validation/test dataset splits (e.g., percentages or counts).
Hardware Specification Yes Solving this subset sum problem with n = 30 for the 2572 parameters in the convolutional layers of Let Net takes around 1 hour on 32 cores of a Intel Xeon Gold 6240 CPU @ 2.60GHz.
Software Dependencies No We adopted Kaiming Uniform (He et al., 2015) for weight initialization, a batch size of 64 and trained for 50 epochs using ADAM optimizer (Kingma & Ba, 2015) with learning rate of 0.001, exponential decay of 0.9 and momentum estimate of 0.999, the default values in Flux.jl (Innes et al., 2018) machine learning library. We also took care to initialize the random number generator not only for Julia but also for Gurobi, so that the experiments can be reproduced using the source code available at https://github.com/Arthur Walraven/cnnslth. While software is mentioned, specific version numbers for Flux.jl or Gurobi (e.g., Gurobi 9.5.1) are not provided, only the year of their publications/manuals.
Experiment Setup Yes We adopted Kaiming Uniform (He et al., 2015) for weight initialization, a batch size of 64 and trained for 50 epochs using ADAM optimizer (Kingma & Ba, 2015) with learning rate of 0.001, exponential decay of 0.9 and momentum estimate of 0.999, the default values in Flux.jl (Innes et al., 2018) machine learning library.