Scaling Up Exact Neural Network Compression by ReLU Stability

Authors: Thiago Serra, Xin Yu, Abhinav Kumar, Srikumar Ramalingam

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We trained and evaluated the compressibility of classifiers for the datasets MNIST [53], CIFAR-10 [48], and CIFAR-100 [48] with and without ℓ1 weight regularization, which is known to induce stability [90]. We refer to Appendix A6 for details on environment and implementation. We use the notation L n for the architecture of L hidden layers with n neurons each. We started at L = 2 and n = 100, and then doubled the width n or incremented the depth L until the majority of the runs for MNIST classifiers for any configuration timed out after 3 hours. With preliminary runs, we chose values for ℓ1 which spanned from those for which accuracy is improving as ℓ1 increases until those for which the accuracy starts decreasing. We trained and evaluated neural networks with 5 different random initialization seeds for each choice of ℓ1. The amount of regularization used did not stabilize the entire layer. We refer to Appendix A7 for additional figures and tables with complete results.
Researcher Affiliation Collaboration Thiago Serra Bucknell University Lewisburg, PA, United States thiago.serra@bucknell.edu Xin Yu University of Utah Salt Lake City, UT, United States xin.yu@utah.edu Abhinav Kumar Michigan State University East Lansing, MI, United States kumarab6@msu.edu Srikumar Ramalingam Google Research New York, NY, United States rsrikumar@google.com
Pseudocode Yes Algorithm 1, which we denote ISA (Identifying Stable Activations), identifies all stable neurons of a neural network.
Open Source Code Yes The code is available at the following link, https://github.com/yuxwind/Exact Compression.
Open Datasets Yes We trained and evaluated the compressibility of classifiers for the datasets MNIST [53], CIFAR-10 [48], and CIFAR-100 [48] with and without ℓ1 weight regularization, which is known to induce stability [90].
Dataset Splits No The paper does not explicitly state training/validation/test dataset splits with percentages or sample counts. While it refers to preprocessing on the 'training set' and evaluation on the 'test set', it does not define a separate validation set or its split.
Hardware Specification Yes All experiments are run on a Linux server with 40 CPUs, 180 GB memory and a Nvidia GeForce RTX 2080 Ti GPU.
Software Dependencies Yes We use Gurobi 9.1 as the MILP solver and PyTorch 1.7.0 as the deep learning framework.
Experiment Setup Yes For all networks, we used the Adam optimizer, trained with 50 epochs, initial learning rate of 1e-3, and a batch size of 128.