EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Authors: Dong Chen, Ning Liu, Yichen Zhu, Zhengping Che, Rui Ma, Fachao Zhang, Xiaofeng Mou, Yi Chang, Jian Tang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation covered diverse benchmarks (CIFAR-10/100, Tiny-Image Net, full Image Net, CUB-200-2011, and Pascal VOC), with EPSD outperforming advanced pruning and SD techniques.
Researcher Affiliation Collaboration Dong Chen1,2*, Ning Liu2*, Yichen Zhu2, Zhengping Che2, Rui Ma1 , Fachao Zhang2, Xiaofeng Mou2, Yi Chang1, Jian Tang2 1School of Artificial Intelligence, Jilin University 2Midea Group
Pseudocode No The paper describes its method in steps and uses mathematical equations, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide any statement or link indicating that source code for the methodology is openly available.
Open Datasets Yes We evaluate EPSD on various benchmarks, including CIFAR-10/CIFAR-100 (Krizhevsky, Hinton et al. 2009), Tiny-Image Net, and full Image Net (Deng et al. 2009) using diverse networks and comparing with the Simple Combination approach, advanced pruning and SD methods.
Dataset Splits No The paper refers to training and testing but does not explicitly provide specific percentages, sample counts, or a detailed methodology for train/validation/test dataset splits needed for reproduction.
Hardware Specification No The paper discusses training efforts and wall time but does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes We incorporate three distinct SD algorithms (CS-KD (Yun et al. 2020), PS-KD (Kim et al. 2021), and DLB (Shen et al. 2022)) into EPSD to ensure a comprehensive evaluation. Our experiments are conducted on CIFAR-10/100 and Tiny Image Net datasets across five sparsity ratios (36%, 59%, 79%, 90%, 95%). To ensure fairness in comparison, we employ identical hyper-parameters for training each dataset.