Progressive Skeletonization: Trimming more fat from a network at initialization

Authors: Pau de Jorge, Amartya Sanyal, Harkirat Behl, Philip Torr, Grégory Rogez, Puneet K. Dokania

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical analysis on a large suite of experiments show that our approach, while providing at least as good a performance as other recent approaches on moderate pruning levels, provide remarkably improved performance on higher pruning levels (could remove up to 99.5% parameters while keeping the networks trainable).
Researcher Affiliation Collaboration Pau de Jorge University of Oxford & NAVER LABS Europe Amartya Sanyal University of Oxford & The Alan Turing Institute, London, UK Harkirat S. Behl University of Oxford Philip H. S. Torr University of Oxford Grégory Rogez NAVER LABS Europe Puneet K. Dokania University of Oxford & Five AI Limited
Pseudocode Yes Algorithm 1 FORCE/Iter SNIP algorithms to find a pruning mask
Open Source Code No In the case of FORCE and Iter SNIP, we adapt the same public implementation of SNIP as Wang et al. (2020). As for GRASP, we use their public code. (Footnotes 6 and 7 link to these public repositories, but they are not presented as the authors' own released code for their specific methods).
Open Datasets Yes We present experiments on CIFAR-10/100 (Krizhevsky et al., 2009), which consists of 60k 32 32 colour images divided into 10/100 classes, and also on Imagenet challenge ILSVRC-2012 (Russakovsky et al., 2015) and its smaller version Tiny-Image Net, which respectively consist of 1.2M/1k and 100k/200 images/classes.
Dataset Splits Yes We separate 10% of the training data for validation and report results on the test set.
Hardware Specification Yes Right: Wall time to compute pruning masks for CIFAR10/Resnet50/Tesla K40m vs acc at 99.5% sparsity;
Software Dependencies No The paper mentions using PyTorch ('Automatic differentiation in pytorch, 2017' by Paszke et al.) and adapting specific model implementations (e.g., for Resnet and VGG) which are typically in PyTorch. However, it does not specify version numbers for PyTorch, Python, or other software libraries.
Experiment Setup Yes For CIFAR datasets, we train Resnet50 and VGG19 architectures during 350 epochs with a batch size of 128. We start with a learning rate of 0.1 and divide it by 10 at 150 and 250 epochs. As optimizer we use SGD with momentum 0.9 and weight decay 5 10 4.