Network Pruning That Matters: A Case Study on Retraining Variants
Authors: Duong Hoang Le, Binh-Son Hua
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we conduct extensive experiments to verify and analyze the uncanny effectiveness of learning rate rewinding. |
| Researcher Affiliation | Collaboration | Duong H. Le Vin AI Research, Vietnam Binh-Son Hua Vin AI Research and Vin University, Vietnam |
| Pseudocode | No | The paper describes various retraining techniques and pruning algorithms but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | To facilitate reproducibility, we would release our implementation upon publication. |
| Open Datasets | Yes | For CIFAR-10 and CIFAR-100, we run each experiment three times and report mean std . For Image Net, we run each experiment once. |
| Dataset Splits | Yes | To make a fair comparison between fine-tuning and no fine-tuning, we randomly split the conventional training set of CIFAR-10/CIFAR-100 (including 50000 images) to train (90% images of total set) and val (10% remaining images) set and we then report the result of best-validation models on standard test set (including 5000 images). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch and refers to implementations from other works, but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | Table 6: Training configuration for unpruned models. To train CIFAR-10, we use Nesterov SGD with β = 0.9, batch size 64, weight decay 0.0001 for 160 epochs. To train Image Net, we use Nesterov SGD with β = 0.9, batch size 32, weight decay 0.0001 for 90 epochs. |