reproducibilityindex.ai

Scaling Up Exact Neural Network Compression by ReLU Stability

Authors: Thiago Serra, Xin Yu, Abhinav Kumar, Srikumar Ramalingam

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We trained and evaluated the compressibility of classiﬁers for the datasets MNIST [53], CIFAR-10 [48], and CIFAR-100 [48] with and without ℓ1 weight regularization, which is known to induce stability [90]. We refer to Appendix A6 for details on environment and implementation. We use the notation L n for the architecture of L hidden layers with n neurons each. We started at L = 2 and n = 100, and then doubled the width n or incremented the depth L until the majority of the runs for MNIST classiﬁers for any conﬁguration timed out after 3 hours. With preliminary runs, we chose values for ℓ1 which spanned from those for which accuracy is improving as ℓ1 increases until those for which the accuracy starts decreasing. We trained and evaluated neural networks with 5 different random initialization seeds for each choice of ℓ1. The amount of regularization used did not stabilize the entire layer. We refer to Appendix A7 for additional ﬁgures and tables with complete results.
Researcher Affiliation	Collaboration	Thiago Serra Bucknell University Lewisburg, PA, United States thiago.serra@bucknell.edu Xin Yu University of Utah Salt Lake City, UT, United States xin.yu@utah.edu Abhinav Kumar Michigan State University East Lansing, MI, United States kumarab6@msu.edu Srikumar Ramalingam Google Research New York, NY, United States rsrikumar@google.com
Pseudocode	Yes	Algorithm 1, which we denote ISA (Identifying Stable Activations), identiﬁes all stable neurons of a neural network.
Open Source Code	Yes	The code is available at the following link, https://github.com/yuxwind/Exact Compression.
Open Datasets	Yes	We trained and evaluated the compressibility of classiﬁers for the datasets MNIST [53], CIFAR-10 [48], and CIFAR-100 [48] with and without ℓ1 weight regularization, which is known to induce stability [90].
Dataset Splits	No	The paper does not explicitly state training/validation/test dataset splits with percentages or sample counts. While it refers to preprocessing on the 'training set' and evaluation on the 'test set', it does not define a separate validation set or its split.
Hardware Specification	Yes	All experiments are run on a Linux server with 40 CPUs, 180 GB memory and a Nvidia GeForce RTX 2080 Ti GPU.
Software Dependencies	Yes	We use Gurobi 9.1 as the MILP solver and PyTorch 1.7.0 as the deep learning framework.
Experiment Setup	Yes	For all networks, we used the Adam optimizer, trained with 50 epochs, initial learning rate of 1e-3, and a batch size of 128.