Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient

Authors: Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our results empirically by approximating a target network via SUBSETSUM in Experiment 1, and by pruning a sufficiently over-parameterized neural network that implements the structures in Figures 1b and 1c in Experiment 2. In both setups, we benchmark on the MNIST [33] dataset, and all training and pruning is accomplished with cosine annealing learning rate decay [34] on a batch size 64 with momentum 0.9 and weight decay 0.0005.
Researcher Affiliation Academia Ankit Pensia University of Wisconsin-Madison ankitp@cs.wisc.edu Shashank Rajput University of Wisconsin-Madison rajput3@wisc.edu Alliot Nagle University of Wisconsin-Madison acnagle@wisc.edu Harit Vishwakarma University of Wisconsin-Madison hvishwakarma@cs.wisc.edu Dimitris Papailiopoulos University of Wisconsin-Madison dimitris@papail.io
Pseudocode No The paper describes mathematical proofs and experimental procedures in narrative text, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statements or links indicating that the source code for the methodology described in the paper is publicly available.
Open Datasets Yes In both setups, we benchmark on the MNIST [33] dataset
Dataset Splits No The paper mentions training on MNIST and achieving a "final test set accuracy", but it does not explicitly provide details about training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification Yes The 397, 000 weights in our target network were approximated with 3, 725, 871 coefficients in 21.5 hours on 36 cores of a c5.18xlarge AWS EC2 instance.
Software Dependencies No The paper mentions using "Gurobi's MIP solver" and cites its reference manual from 2020, but it does not provide a specific version number (e.g., Gurobi X.Y) for the software dependencies used.
Experiment Setup Yes ...all training and pruning is accomplished with cosine annealing learning rate decay [34] on a batch size 64 with momentum 0.9 and weight decay 0.0005.