DKM: Differentiable k-Means Clustering Layer for Neural Network Compression
Authors: Minsik Cho, Keivan Alizadeh-Vahid, Saurabh Adya, Mohammad Rastegari
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on Image Net1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 Image Net1k accuracy on Res Net50 with 3.3MB model size (29.4x model compression factor). |
| Researcher Affiliation | Industry | Minsik Cho Keivan Alizadeh-Vahid Saurabh Adya Mohammad Rastegari {minsik, kalizadehvahid, sadya, mrastegari}@apple.com |
| Pseudocode | No | The paper describes the algorithm steps in paragraph form and references Figure 2 for an iterative process, but does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statement about making its source code available, nor does it provide a link to a code repository for the DKM methodology. |
| Open Datasets | Yes | We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on Image Net1k and GLUE benchmarks. |
| Dataset Splits | No | The paper mentions "validation/test accuracies" but does not specify the explicit splits (e.g., percentages or sample counts) used for training, validation, and testing. It only refers to standard data augmentation techniques for ImageNet1k and default setup from Hugging Face for GLUE. |
| Hardware Specification | Yes | All our experiments with DKM were done on two x86 Linux machine with eight NVIDIA V100 GPUs each in a public cloud infrastructure. |
| Software Dependencies | No | The paper mentions PyTorch and TensorFlow-Eager but does not provide specific version numbers for these or any other software dependencies. "The iterative process will be dynamically executed imperatively in Py Torch (Paszke et al., 2019) and Tensorflow-Eager (Agrawal et al., 2019)..." |
| Experiment Setup | Yes | We used a SGD optimizer with momentum 0.9, and fixed the learning rate at 0.008 (without individual hyper-parameter tuning) for all the experiments for DKM. Each compression scheme starts with publicly available pre-trained models. The ǫ is set as 1e 4 and the iteration limit is 5. We set the mini-batch size 128 per GPU (i.e., global mini-batch size of 2048) and ran for 200 epochs for all DKM cases. |