Trainability Preserving Neural Pruning
Authors: Huan Wang, Yun Fu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies on linear MLP networks show that TPP can perform on par with the oracle trainability recovery scheme. On nonlinear Conv Nets (Res Net56/VGG19) on CIFAR10/100, TPP outperforms the other counterpart approaches by an obvious margin. Moreover, results on Image Net-1K with Res Nets suggest that TPP consistently performs more favorably against other top-performing structured pruning approaches. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Huan Wang1 Yun Fu1,2 1Northeastern University, Boston, USA 2AInnovation Labs, Inc. wang.huan@northeastern.edu yunfu@ece.neu.edu |
| Pseudocode | Yes | Algorithm 1 Trainability Preserving Pruning (TPP) |
| Open Source Code | Yes | Code: https://github.com/Ming Sun-Tse/TPP. |
| Open Datasets | Yes | We first present some analyses with MLP-7-Linear network on MNIST (Le Cun et al., 1998). Then compare our method to other plausible solutions with the Res Net56 (He et al., 2016) and VGG19 (Simonyan & Zisserman, 2015) networks, on the CIFAR10 and 100 datasets (Krizhevsky, 2009), respectively. Next we evaluate our algorithm on Image Net-1K (Deng et al., 2009) with Res Net34 and Res Net50 (He et al., 2016). All the datasets in this paper are public datasets with standard APIs in Py Torch (Paszke et al., 2019). |
| Dataset Splits | Yes | All the datasets in this paper are public datasets with standard APIs in Py Torch (Paszke et al., 2019). We employs these standard APIs for the train/test data split to keep fair comparison with other methods. Table 2: Comparison on Image Net-1K validation set. |
| Hardware Specification | Yes | We conduct all our experiments using 4 NVIDIA V100 GPUs (16GB memory per GPU). |
| Software Dependencies | No | Official Py Torch Image Net example1; GReg-1/GReg-2 (Wang et al., 2021b)2; Orth Conv (Wang et al., 2020)3; Rethinking the value of network pruning (Liu et al., 2019b)4. All the datasets in this paper are public datasets with standard APIs in Py Torch (Paszke et al., 2019). (No specific version numbers for PyTorch or other libraries are provided.) |
| Experiment Setup | Yes | Table 5: Summary of training setups. In the parentheses of SGD are the momentum and weight decay. For LR schedule, the first number is initial LR; the second (in brackets) is the epochs when LR is decayed by factor 1/10; and #epochs stands for the total number of epochs. Table 6: Hyper-parameters of our methods. |