Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient
Authors: Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our results empirically by approximating a target network via SUBSETSUM in Experiment 1, and by pruning a sufficiently over-parameterized neural network that implements the structures in Figures 1b and 1c in Experiment 2. In both setups, we benchmark on the MNIST [33] dataset, and all training and pruning is accomplished with cosine annealing learning rate decay [34] on a batch size 64 with momentum 0.9 and weight decay 0.0005. |
| Researcher Affiliation | Academia | Ankit Pensia University of Wisconsin-Madison ankitp@cs.wisc.edu Shashank Rajput University of Wisconsin-Madison rajput3@wisc.edu Alliot Nagle University of Wisconsin-Madison acnagle@wisc.edu Harit Vishwakarma University of Wisconsin-Madison hvishwakarma@cs.wisc.edu Dimitris Papailiopoulos University of Wisconsin-Madison dimitris@papail.io |
| Pseudocode | No | The paper describes mathematical proofs and experimental procedures in narrative text, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statements or links indicating that the source code for the methodology described in the paper is publicly available. |
| Open Datasets | Yes | In both setups, we benchmark on the MNIST [33] dataset |
| Dataset Splits | No | The paper mentions training on MNIST and achieving a "final test set accuracy", but it does not explicitly provide details about training/validation/test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | The 397, 000 weights in our target network were approximated with 3, 725, 871 coefficients in 21.5 hours on 36 cores of a c5.18xlarge AWS EC2 instance. |
| Software Dependencies | No | The paper mentions using "Gurobi's MIP solver" and cites its reference manual from 2020, but it does not provide a specific version number (e.g., Gurobi X.Y) for the software dependencies used. |
| Experiment Setup | Yes | ...all training and pruning is accomplished with cosine annealing learning rate decay [34] on a batch size 64 with momentum 0.9 and weight decay 0.0005. |