Trainability Preserving Neural Pruning

Authors: Huan Wang, Yun Fu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on linear MLP networks show that TPP can perform on par with the oracle trainability recovery scheme. On nonlinear Conv Nets (Res Net56/VGG19) on CIFAR10/100, TPP outperforms the other counterpart approaches by an obvious margin. Moreover, results on Image Net-1K with Res Nets suggest that TPP consistently performs more favorably against other top-performing structured pruning approaches. 4 EXPERIMENTS
Researcher Affiliation Collaboration Huan Wang1 Yun Fu1,2 1Northeastern University, Boston, USA 2AInnovation Labs, Inc. wang.huan@northeastern.edu yunfu@ece.neu.edu
Pseudocode Yes Algorithm 1 Trainability Preserving Pruning (TPP)
Open Source Code Yes Code: https://github.com/Ming Sun-Tse/TPP.
Open Datasets Yes We first present some analyses with MLP-7-Linear network on MNIST (Le Cun et al., 1998). Then compare our method to other plausible solutions with the Res Net56 (He et al., 2016) and VGG19 (Simonyan & Zisserman, 2015) networks, on the CIFAR10 and 100 datasets (Krizhevsky, 2009), respectively. Next we evaluate our algorithm on Image Net-1K (Deng et al., 2009) with Res Net34 and Res Net50 (He et al., 2016). All the datasets in this paper are public datasets with standard APIs in Py Torch (Paszke et al., 2019).
Dataset Splits Yes All the datasets in this paper are public datasets with standard APIs in Py Torch (Paszke et al., 2019). We employs these standard APIs for the train/test data split to keep fair comparison with other methods. Table 2: Comparison on Image Net-1K validation set.
Hardware Specification Yes We conduct all our experiments using 4 NVIDIA V100 GPUs (16GB memory per GPU).
Software Dependencies No Official Py Torch Image Net example1; GReg-1/GReg-2 (Wang et al., 2021b)2; Orth Conv (Wang et al., 2020)3; Rethinking the value of network pruning (Liu et al., 2019b)4. All the datasets in this paper are public datasets with standard APIs in Py Torch (Paszke et al., 2019). (No specific version numbers for PyTorch or other libraries are provided.)
Experiment Setup Yes Table 5: Summary of training setups. In the parentheses of SGD are the momentum and weight decay. For LR schedule, the first number is initial LR; the second (in brackets) is the epochs when LR is decayed by factor 1/10; and #epochs stands for the total number of epochs. Table 6: Hyper-parameters of our methods.