Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Authors: Jonathan Frankle, Michael Carbin
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. |
| Researcher Affiliation | Academia | Jonathan Frankle MIT CSAIL EMAIL Michael Carbin MIT CSAIL EMAIL |
| Pseudocode | Yes | Strategy 1: Iterative pruning with resetting. 1. Randomly initialize a neural network f(x; m θ) where θ = θ0 and m = 1|θ| is a mask. 2. Train the network for j iterations, reaching parameters m θj. 3. Prune s% of the parameters, creating an updated mask m where Pm = (Pm s)%. 4. Reset the weights of the remaining portion of the network to their values in θ0. That is, let θ = θ0. 5. Let m = m and repeat steps 2 through 4 until a sufficiently pruned network has been obtained. |
| Open Source Code | No | The paper does not contain an explicit statement about the authors' source code being made available or a link to a code repository for their methodology. |
| Open Datasets | Yes | We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. |
| Dataset Splits | Yes | We randomly sampled a 5,000-example validation set from the training set and used the remaining 55,000 training examples as our training set for the rest of the paper (including Section 2). |
| Hardware Specification | No | We gratefully acknowledge IBM, which through the MIT-IBM Watson AI Lab contributed the computational resources necessary to conduct the experiments in this paper." (This does not specify exact hardware models or configurations.) |
| Software Dependencies | No | The paper mentions software components and optimizers (e.g., 'Adam optimizer', 'SGD', 'dropout', 'batchnorm'), but does not provide specific version numbers for any of them (e.g., 'PyTorch 1.9', 'TensorFlow 2.x'). |
| Experiment Setup | Yes | The training set is presented to the network in mini-batches of 60 examples; at each epoch, the entire training set is shuffled." and "We use the Adam optimizer (Kingma & Ba, 2014) and Gaussian Glorot initialization (Glorot & Bengio, 2010)." and "We use a batch size of 128. We use batch normalization. We use weight decay of 0.0001." |