Progressive Skeletonization: Trimming more fat from a network at initialization
Authors: Pau de Jorge, Amartya Sanyal, Harkirat Behl, Philip Torr, Grégory Rogez, Puneet K. Dokania
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical analysis on a large suite of experiments show that our approach, while providing at least as good a performance as other recent approaches on moderate pruning levels, provide remarkably improved performance on higher pruning levels (could remove up to 99.5% parameters while keeping the networks trainable). |
| Researcher Affiliation | Collaboration | Pau de Jorge University of Oxford & NAVER LABS Europe Amartya Sanyal University of Oxford & The Alan Turing Institute, London, UK Harkirat S. Behl University of Oxford Philip H. S. Torr University of Oxford Grégory Rogez NAVER LABS Europe Puneet K. Dokania University of Oxford & Five AI Limited |
| Pseudocode | Yes | Algorithm 1 FORCE/Iter SNIP algorithms to find a pruning mask |
| Open Source Code | No | In the case of FORCE and Iter SNIP, we adapt the same public implementation of SNIP as Wang et al. (2020). As for GRASP, we use their public code. (Footnotes 6 and 7 link to these public repositories, but they are not presented as the authors' own released code for their specific methods). |
| Open Datasets | Yes | We present experiments on CIFAR-10/100 (Krizhevsky et al., 2009), which consists of 60k 32 32 colour images divided into 10/100 classes, and also on Imagenet challenge ILSVRC-2012 (Russakovsky et al., 2015) and its smaller version Tiny-Image Net, which respectively consist of 1.2M/1k and 100k/200 images/classes. |
| Dataset Splits | Yes | We separate 10% of the training data for validation and report results on the test set. |
| Hardware Specification | Yes | Right: Wall time to compute pruning masks for CIFAR10/Resnet50/Tesla K40m vs acc at 99.5% sparsity; |
| Software Dependencies | No | The paper mentions using PyTorch ('Automatic differentiation in pytorch, 2017' by Paszke et al.) and adapting specific model implementations (e.g., for Resnet and VGG) which are typically in PyTorch. However, it does not specify version numbers for PyTorch, Python, or other software libraries. |
| Experiment Setup | Yes | For CIFAR datasets, we train Resnet50 and VGG19 architectures during 350 epochs with a batch size of 128. We start with a learning rate of 0.1 and divide it by 10 at 150 and 250 epochs. As optimizer we use SGD with momentum 0.9 and weight decay 5 10 4. |