reproducibilityindex.ai

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Authors: Jonathan Frankle, Michael Carbin

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10.
Researcher Affiliation	Academia	Jonathan Frankle MIT CSAIL jfrankle@csail.mit.edu Michael Carbin MIT CSAIL mcarbin@csail.mit.edu
Pseudocode	Yes	Strategy 1: Iterative pruning with resetting. 1. Randomly initialize a neural network f(x; m θ) where θ = θ0 and m = 1\|θ\| is a mask. 2. Train the network for j iterations, reaching parameters m θj. 3. Prune s% of the parameters, creating an updated mask m where Pm = (Pm s)%. 4. Reset the weights of the remaining portion of the network to their values in θ0. That is, let θ = θ0. 5. Let m = m and repeat steps 2 through 4 until a sufﬁciently pruned network has been obtained.
Open Source Code	No	The paper does not contain an explicit statement about the authors' source code being made available or a link to a code repository for their methodology.
Open Datasets	Yes	We consistently ﬁnd winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10.
Dataset Splits	Yes	We randomly sampled a 5,000-example validation set from the training set and used the remaining 55,000 training examples as our training set for the rest of the paper (including Section 2).
Hardware Specification	No	We gratefully acknowledge IBM, which through the MIT-IBM Watson AI Lab contributed the computational resources necessary to conduct the experiments in this paper." (This does not specify exact hardware models or configurations.)
Software Dependencies	No	The paper mentions software components and optimizers (e.g., 'Adam optimizer', 'SGD', 'dropout', 'batchnorm'), but does not provide specific version numbers for any of them (e.g., 'PyTorch 1.9', 'TensorFlow 2.x').
Experiment Setup	Yes	The training set is presented to the network in mini-batches of 60 examples; at each epoch, the entire training set is shufﬂed." and "We use the Adam optimizer (Kingma & Ba, 2014) and Gaussian Glorot initialization (Glorot & Bengio, 2010)." and "We use a batch size of 128. We use batch normalization. We use weight decay of 0.0001."