NTK-SAP: Improving neural network pruning by aligning training dynamics
Authors: Yite Wang, Dawei Li, Ruoyu Sun
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our method achieves better performance than all baselines on multiple datasets. Our code is available at https://github. com/Yite Wang/NTK-SAP. Empirically, we show that NTK-SAP, as a data-agnostic foresight pruning method, achieves state-of-the-art performance in multiple settings. |
| Researcher Affiliation | Academia | 1University of Illinois Urbana-Champaign, USA 2Shenzhen International Center for Industrial and Applied Mathematics, Shenzhen Research Institute of Big Data 3School of Data Science, The Chinese University of Hong Kong, Shenzhen, China |
| Pseudocode | Yes | Algorithm 1 Neural Tagent Kernel Spectrum-Aware Pruning (NTK-SAP) |
| Open Source Code | Yes | Our code is available at https://github. com/Yite Wang/NTK-SAP. |
| Open Datasets | Yes | We use CIFAR-10, CIFAR-100, Tiny-Image Net and Image Net. They do not contain personally identifiable information or offensive content. |
| Dataset Splits | No | The paper uses standard public datasets (CIFAR-10, CIFAR-100, Tiny-Image Net, Image Net) and refers to existing training protocols, but it does not explicitly provide percentages, sample counts, or specific references for how these datasets were split into training, validation, and test sets within the paper's text. |
| Hardware Specification | Yes | All of our experiments were run on NVIDIA V100s. Experiments on CIFAR-10/100 and Tiny Image Net datasets were run on a single GPU at a time. We use 2 and 4 GPUs for Res Net-18 and Res Net-50 in Image Net experiments, respectively. |
| Software Dependencies | No | The paper states, 'We use the torchvision implementations' and 'Our code is based on the original code of Synflow,' but it does not provide specific version numbers for these or any other software components. |
| Experiment Setup | Yes | Target datasets, models, and sparsity ratios. ... Details of training hyperparameters can be found in Appendix A. ... We prune networks using a batch size of 256 for CIFAR-10/100 and Tiny-Image Net datasets and a batch size of 128 for Image Net experiments. ... Table 2: Training hyper-parameters used in this work. Network, Dataset, Epochs, Batch, Optimizer, Momentum, LR, LR drop, Weight decay, Initialization. |