Powerpropagation: A sparsity inducing weight reparameterisation
Authors: Jonathan Schwarz, Siddhant Jayakumar, Razvan Pascanu, Peter E Latham, Yee Teh
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now provide an experimental comparison of Powerpropagation to a variety of other techniques, both in the sparsity and continual learning settings. Throughout this section we will be guided by three key questions: (i) Can we provide experimental evidence for inherent sparsity? (ii) If so, can Powerprop. be successfully combined with existing sparsity techniques? (iii) Do improvements brought by Powerprop. translate to measurable advances in Continual Learning? and Figure 2 shows this comparison for image classification on the popular CIFAR-10 [67] and Image Net [68] datasets using a smaller version of Alex Net [3] and Res Net50 [4] respectively. |
| Researcher Affiliation | Collaboration | Jonathan Schwarz Deep Mind & Gatsby Unit, UCL schwarzjn@google.com Siddhant M. Jayakumar Deep Mind & University College London Razvan Pascanu Deep Mind Peter E. Latham Gatsby Unit, UCL Yee Whye Teh Deep Mind |
| Pseudocode | Yes | Algorithm 1: Efficient Pack Net (EPN) + Powerpropagation. |
| Open Source Code | Yes | We provide code to reproduce the MNIST results (a) in the accompanying notebook. 3https://github.com/deepmind/deepmind-research/tree/master/powerpropagation |
| Open Datasets | Yes | Figure 1a shows the effect of increasing sparsity on the layerwise magnitude-pruning setting for Le Net [40] on MNIST [41]. and Figure 2 shows this comparison for image classification on the popular CIFAR-10 [67] and Image Net [68] datasets |
| Dataset Splits | Yes | terminating the search once the sparse model s performance falls short of a minimum accepted target performance γP (computed on a held-out validation set) and Ps E(XT , y T , φ Mt) // Validation performance of sparse model (from Algorithm 1, Line 9). |
| Hardware Specification | No | The paper discusses computational costs and efficiency (reducing the computational footprint of models), but does not specify the exact hardware (e.g., GPU/CPU models, types of accelerators) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Adam [33] as an optimizer, but does not provide specific version numbers for programming languages, libraries, frameworks (like TensorFlow or PyTorch), or other software components used in the experiments. |
| Experiment Setup | Yes | Finally is worth noting that the choice of α does influence the optimal learning rate schedule and best results were obtained after changes to the default schedule. and training for 1M steps with Adam [33] (relying on the formulation in Section 2) on each task while allowing 100k retrain steps for Pack Net. Also, Algorithm 1 specifies Target performance γ [0, 1]; Sparsity rates S = [s1, . . . , sn]. |