reproducibilityindex.ai

Scalable Model Compression by Entropy Penalized Reparameterization

Authors: Deniz Oktay, Johannes Ballé, Saurabh Singh, Abhinav Shrivastava

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the method on the MNIST, CIFAR-10 and Image Net classiﬁcation benchmarks using six distinct model architectures. Our results show that state-of-the-art model compression can be achieved in a scalable and general way without requiring complex procedures such as multi-stage training.
Researcher Affiliation	Collaboration	Deniz Oktay Princeton University Princeton, NJ, USA doktay@cs.princeton.edu Johannes Ballé Google Research Mountain View, CA, USA jballe@google.com Saurabh Singh Google Research Mountain View, CA, USA saurabhsingh@google.com Abhinav Shrivastava University of Maryland, College Park College Park, MD, USA abhinav@cs.umd.edu
Pseudocode	No	The paper describes the method in prose and through diagrams (Figure 2, Figure 3), but does not contain a formal pseudocode or algorithm block.
Open Source Code	Yes	In addition, our code is publicly available4. 4Refer to examples in https://github.com/tensorflow/compression.
Open Datasets	Yes	We evaluate the method on the MNIST, CIFAR-10 and Image Net classiﬁcation benchmarks... Le Net300-100 (Lecun et al., 1998) and Le Net-5-Caffe2 on MNIST (Le Cun and Cortes, 2010), as well as VGG-163 (Simonyan and Zisserman, 2015) and Res Net-20 (He et al., 2016b; Zagoruyko and Komodakis, 2016) with width multiplier 4 (Res Net-204) on CIFAR-10 (Zagoruyko and Komodakis, 2016). For our Image Net experiments, we evaluate our method on the Res Net-18 and Res Net-50 (He et al., 2016a) networks.
Dataset Splits	No	The paper mentions using well-known datasets (MNIST, CIFAR-10, ImageNet) and describes training procedures and evaluation using EMA, but it does not explicitly provide specific details on how the datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or explicit mention of validation set usage).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies	No	The paper mentions software components like 'tensorflow/compression', 'Caffe', and 'Torch', but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We found it useful to use two separate optimizers: one to optimize the variables of the probability models qi, and one to optimize the reparameterizations Φ and variables of the parameter decoders Ψ. While the latter is chosen to be the same optimizer typically used for the task/architecture, the former is always Adam (Kingma and Ba, 2015) with a learning rate of 10 4. ... We train the networks using Adam with a constant learning rate of 0.001 for 200,000 iterations. ... For both VGG-16 and Res Net-20-4, we use momentum of 0.9 with an initial learning rate of 0.1, and decay by 0.2 at iterations 256,000, 384,000, and 448,000 for a total of 512,000 iterations.