LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

Authors: Jinuk Kim, Marwa El Halabi, Mingi Ji, Hyun Oh Song

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our method consistently outperforms existing depth compression and layer pruning methods on various network architectures, both on image classification and generation tasks.
Researcher Affiliation Collaboration 1 Department of Computer Science and Engineering, Seoul National University 2 Neural Processing Research Center 3 Samsung SAIT AI Lab, Montreal 4 Google.
Pseudocode Yes Algorithm 1 DP algorithm for Problem (5) input Importance I, latency T, latency budget T0, discretization level P Initialize M[0, t] 0 for t 0, M[l, t] for t < 0, A[0, t] , C[0, t] Discretize latency values in T for l = 1 to L do... Algorithm 2 Layer Merge input Input network f, latency budget T0, descretization level P for i = 0 to L 1 do...
Open Source Code Yes We release the code at https:// github.com/snu-mllab/Layer Merge.
Open Datasets Yes We apply our method on Res Net-34 and Mobile Net V2 models (He et al., 2016; Sandler et al., 2018) for the image classification task, and on the DDPM model (Ho et al., 2020) for the image generation task... on the Image Net dataset... on the CIFAR10 dataset.
Dataset Splits Yes We report the last top-1 accuracy of the compressed model after fine-tuning, evaluated on the validation set, and its corresponding latency speedup. In particular, we use a fine-tuning subset of size 4% of the total training dataset size for Image Net, and 1% for CIFAR10. The separate subset is also the same size as the fine-tuning subset.
Hardware Specification Yes We construct the latency lookup table of each method on RTX2080 Ti GPU and report the wall-clock latency speed-up of the compressed networks measured on the same device... GPU hours for constructing the importance table is measured in RTX3090 and the latency table is measured in RTX2080 Ti.
Software Dependencies No The paper mentions 'Py Torch' and 'Tensor RT' (Paszke et al., 2017; Vanholder, 2016) but does not specify their version numbers or versions for other software dependencies.
Experiment Setup Yes For Res Net-34, we fine-tune each pruned network for 90 epochs following the same fine-tuning recipe as HALP (Shen et al., 2022). For Mobile Net V2, we fine-tune for 180 epochs, using the same fine-tuning recipe as Kim et al. (2023). For DDPM, we follow the fine-tuning and sampling recipe of Diff-Pruning (Fang et al., 2023), except for the learning rate which we set to 4 10 4 since it leads to better performance. When measuring latency speedup, we use a batch size of 128 for the Image Net dataset and 64 for the CIFAR10 dataset.