Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Proving the Lottery Ticket Hypothesis for Convolutional Neural Networks
Authors: Arthur da Cunha, Emanuele Natale, Laurent Viennot
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide basic experiments showing that starting from a random CNN which is roughly 30 times larger than Le Net5, it is possible to compute in few hours a pruning mask that allows to approximate the trained convolutional layers of Le Net5 with relative error 10 3, even when ignoring some hypothesis of our theoretical result. Our theoretical analysis follows the approach of Malach et al. (2020) and make use of two layers to approximate one. We borrow from Pensia et al. (2020) the use of random subset sum (RSS) (Lueker, 1998) to approximate a given weight via the sum of a subset of a sample of random weights, and carefully design instances of RSS via a combination of two convolutional layers. 3 EXPERIMENTS As networks with higher parameter count tend to be more robust to noise, we stick to the small CNN architecture used by Pensia et al. (2020), namely, Le Net5 (Le Cun et al., 1989a) with Re LU activations. We conduct our experiments by ๏ฌrst training a the network to 98.99% test accuracy on MNIST dataset (Lecun et al., 1998). To avoid well-known limitations of the MNIST dataset (in particular its large number of zero entries), we also trained it on the Fashion-MNIST dataset (Xiao et al., 2017) to 89.12% test accuracy. |
| Researcher Affiliation | Academia | Arthur C. W. da Cunha & Emanuele Natale Inria Sophia Antipolis Sophia Antipolis, France EMAIL Laurent Viennot Inria Paris, IRIF Paris, France EMAIL |
| Pseudocode | No | The paper provides detailed proofs and mathematical formulations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | We also took care to initialize the random number generator not only for Julia but also for Gurobi, so that the experiments can be reproduced using the source code available at https://github.com/Arthur Walraven/cnnslth. |
| Open Datasets | Yes | We conduct our experiments by ๏ฌrst training a the network to 98.99% test accuracy on MNIST dataset (Lecun et al., 1998). To avoid well-known limitations of the MNIST dataset (in particular its large number of zero entries), we also trained it on the Fashion-MNIST dataset (Xiao et al., 2017) to 89.12% test accuracy. |
| Dataset Splits | No | The paper mentions training and testing on MNIST and Fashion-MNIST, but does not specify the train/validation/test dataset splits (e.g., percentages or counts). |
| Hardware Specification | Yes | Solving this subset sum problem with n = 30 for the 2572 parameters in the convolutional layers of Let Net takes around 1 hour on 32 cores of a Intel Xeon Gold 6240 CPU @ 2.60GHz. |
| Software Dependencies | No | We adopted Kaiming Uniform (He et al., 2015) for weight initialization, a batch size of 64 and trained for 50 epochs using ADAM optimizer (Kingma & Ba, 2015) with learning rate of 0.001, exponential decay of 0.9 and momentum estimate of 0.999, the default values in Flux.jl (Innes et al., 2018) machine learning library. We also took care to initialize the random number generator not only for Julia but also for Gurobi, so that the experiments can be reproduced using the source code available at https://github.com/Arthur Walraven/cnnslth. While software is mentioned, specific version numbers for Flux.jl or Gurobi (e.g., Gurobi 9.5.1) are not provided, only the year of their publications/manuals. |
| Experiment Setup | Yes | We adopted Kaiming Uniform (He et al., 2015) for weight initialization, a batch size of 64 and trained for 50 epochs using ADAM optimizer (Kingma & Ba, 2015) with learning rate of 0.001, exponential decay of 0.9 and momentum estimate of 0.999, the default values in Flux.jl (Innes et al., 2018) machine learning library. |