reproducibilityindex.ai

DKM: Differentiable k-Means Clustering Layer for Neural Network Compression

Authors: Minsik Cho, Keivan Alizadeh-Vahid, Saurabh Adya, Mohammad Rastegari

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on Image Net1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 Image Net1k accuracy on Res Net50 with 3.3MB model size (29.4x model compression factor).
Researcher Affiliation	Industry	Minsik Cho Keivan Alizadeh-Vahid Saurabh Adya Mohammad Rastegari {minsik, kalizadehvahid, sadya, mrastegari}@apple.com
Pseudocode	No	The paper describes the algorithm steps in paragraph form and references Figure 2 for an iterative process, but does not include a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statement about making its source code available, nor does it provide a link to a code repository for the DKM methodology.
Open Datasets	Yes	We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on Image Net1k and GLUE benchmarks.
Dataset Splits	No	The paper mentions "validation/test accuracies" but does not specify the explicit splits (e.g., percentages or sample counts) used for training, validation, and testing. It only refers to standard data augmentation techniques for ImageNet1k and default setup from Hugging Face for GLUE.
Hardware Specification	Yes	All our experiments with DKM were done on two x86 Linux machine with eight NVIDIA V100 GPUs each in a public cloud infrastructure.
Software Dependencies	No	The paper mentions PyTorch and TensorFlow-Eager but does not provide specific version numbers for these or any other software dependencies. "The iterative process will be dynamically executed imperatively in Py Torch (Paszke et al., 2019) and Tensorﬂow-Eager (Agrawal et al., 2019)..."
Experiment Setup	Yes	We used a SGD optimizer with momentum 0.9, and ﬁxed the learning rate at 0.008 (without individual hyper-parameter tuning) for all the experiments for DKM. Each compression scheme starts with publicly available pre-trained models. The ǫ is set as 1e 4 and the iteration limit is 5. We set the mini-batch size 128 per GPU (i.e., global mini-batch size of 2048) and ran for 200 epochs for all DKM cases.