reproducibilityindex.ai

Weightless: Lossy weight encoding for deep neural network compression

Authors: Brandon Reagan, Udit Gupta, Bob Adolf, Michael Mitzenmacher, Alexander Rush, Gu-Yeon Wei, David Brooks

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Weightless on three networks commonly used to study compression: Le Net-300-100, Le Net5 (Le Cun et al., 1998), and VGG-16 (Simonyan & Zisserman, 2015). The Le Net networks use MNIST (Lecun & Cortes, 1998) and VGG-16 uses Image Net (Russakovsky et al., 2015). Table 1. Experimental Setup.
Researcher Affiliation	Collaboration	1Harvard University, Cambridge, MA 2Facebook, Menlo Park, CA. Correspondence to: Brandon Reagen <reagen@fas.harvard.edu>.
Pseudocode	Yes	Algorithm 1 Weightless compression method
Open Source Code	No	The paper mentions using Keras and implementing the Bloomier filter in-house but does not provide any statement or link indicating that the source code for their Weightless method is publicly available.
Open Datasets	Yes	The Le Net networks use MNIST (Lecun & Cortes, 1998) and VGG-16 uses Image Net (Russakovsky et al., 2015).
Dataset Splits	No	The paper mentions 'model validation error' and refers to retraining, but it does not provide specific details on the dataset splits (e.g., percentages, sample counts, or explicit methodology for training, validation, and test sets).
Hardware Specification	Yes	On a Intel i7-6700K CPU reconstructing (decoding) the largest layers of each model takes 0.52, 1.3, and 22.8 seconds for MNIST-300-100, Le Net5, and VGG-16 respectively; on the ARM A53 mobile class CPU used in smartphones since 2014 (Qualcomm, 2018), the same layers take 7.1, 18, and 296 seconds to reconstruct.
Software Dependencies	No	The paper mentions 'Keras (Chollet, 2017)' and 'Mersenne Twister pseudorandom number generator' but does not provide specific version numbers for Keras or other software dependencies.
Experiment Setup	Yes	Table 1 shows the models and simpliﬁcation parameters used in our experiments. We apply Weightless to the largest layers in each model. ... Weights are pruned using either magnitude threshold or dynamic network surgery (see Section 3.2). Once pruned, weights are clustered with k-means. We found that careful choice of initial seeds helped to minimizing the number of clusters needed. We use density-based initialization on a per-layer basis, where initial cluster values are assigned based on the input weight distribution. Tuning the ﬁlter size The use of Bloomier ﬁlters introduces an additional hyperparameter t that sets the ﬁlters encoding strength... Experimentally, for the models considered, we ﬁnd that t typically falls in the range of 6 to 9.