A Signal Propagation Perspective for Pruning Neural Networks at Initialization
Authors: Namhoon Lee, Thalaiyasingam Ajanthan, Stephen Gould, Philip H. S. Torr
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our modifications to the existing pruning at initialization method lead to improved results on all tested network models for image classification tasks. Furthermore, we empirically study the effect of supervision for pruning and demonstrate that our signal propagation perspective, combined with unsupervised pruning, can be useful in various scenarios where pruning is applied to non-standard arbitrarily-designed architectures. |
| Researcher Affiliation | Academia | Namhoon Lee1, Thalaiyasingam Ajanthan2, Stephen Gould2, Philip H. S. Torr1 1University of Oxford 2Australian National University 1{namhoon,phst}@robots.ox.ac.uk 2{thalaiyasingam.ajanthan, stephen.gould}@anu.edu.au |
| Pseudocode | No | The paper contains mathematical derivations and descriptions of processes but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code can be found here: https://github.com/namhoonlee/spp-public. |
| Open Datasets | Yes | Throughout experiments, we evaluate pruning results on MNIST, CIFAR-10, and Tiny-Image Net image classification tasks. |
| Dataset Splits | Yes | For all experiments, we use 10% of training set for the validation set, which corresponds to 5400, 5000, 9000 images for MNIST, CIFAR-10, Tiny-Iamge Net, respectively. |
| Hardware Specification | No | The paper mentions that computations can take "less than a few seconds on a modern computer" but does not provide any specific hardware details such as CPU/GPU models or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) needed to replicate the experiments. |
| Experiment Setup | Yes | For training of the pruned sparse networks, we use SGD with momentum and train up to 80k (for MNIST) or 100k (for CIFAR-10 and Tiny-Image Net) iterations. The initial learning rate is set to be 0.1 and is decayed by 1/10 at every 20k (MNIST) or 25k (CIFAR-10 and Tiny-Image Net). The mini-batch size is set to be 100, 128, 200 for MNIST, CIFAR-10, Tiny-Image Net, respectively. |