Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, And No Retraining
Authors: Miao Lu, Xiaolong Luo, Tianlong Chen, Wuyang Chen, Dong Liu, Zhangyang Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on CIFAR-10 and Tiny-Image Net datasets demonstrate that our new framework named SFWpruning consistently achieves the state-of-the-art performance on various benchmark DNNs over a wide range of pruning ratios. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China, 2University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1: Stochastic Frank-Wolfe with Momentum for Deep Neural Network Training; Algorithm 2: Stochastic Frank-Wolfe Pruning Framework (SFW-Pruning); Algorithm 3: Stochastic Frank-Wolfe Initialization Scheme (SFW-Init) |
| Open Source Code | Yes | Codes are available in https://github.com/VITA-Group/SFW-Once-for-All-Pruning. |
| Open Datasets | Yes | We conduct experiments via two popular architectures, Res Net-18 (He et al., 2016) and VGG-16 (Simonyan & Zisserman, 2014), on two benchmark datasets, CIFAR-10 (Krizhevsky et al., 2009) and Tiny-Image Net (Wu et al., 2017). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about a separate validation split (e.g., percentages or sample counts). While it discusses learning rate adjustments based on 5-epoch and 10-epoch average loss, it does not specify a distinct validation dataset split for this. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We summarize the key experiment setups, with hyperparameters of the implementation presented in Appendix A in detail. [...] Initial learning rate α0 1.0; Training batchsize 128; Test batchsize 100; Radius τ 15; K-frac {Kl}L l=1 5%; Training epoch T 180; Momentum ρ 0.9 (Table 2); Learning rate κ 0.001; Training iterations T 390; Minimal scaling ϵ, ε 0.01 (Table 3). [...] We decrease the learning rate by 10 at epoch 61 and 121. Also, we dynamically change the learning rate (Pokutta et al., 2020): the learning rate is multiplied by 0.7 if the 5-epoch average loss is greater than the 10-epoch average loss, and is increased by a factor 1.06 if the opposite holds. |