reproducibilityindex.ai

Powerpropagation: A sparsity inducing weight reparameterisation

Authors: Jonathan Schwarz, Siddhant Jayakumar, Razvan Pascanu, Peter E Latham, Yee Teh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now provide an experimental comparison of Powerpropagation to a variety of other techniques, both in the sparsity and continual learning settings. Throughout this section we will be guided by three key questions: (i) Can we provide experimental evidence for inherent sparsity? (ii) If so, can Powerprop. be successfully combined with existing sparsity techniques? (iii) Do improvements brought by Powerprop. translate to measurable advances in Continual Learning? and Figure 2 shows this comparison for image classiﬁcation on the popular CIFAR-10 [67] and Image Net [68] datasets using a smaller version of Alex Net [3] and Res Net50 [4] respectively.
Researcher Affiliation	Collaboration	Jonathan Schwarz Deep Mind & Gatsby Unit, UCL schwarzjn@google.com Siddhant M. Jayakumar Deep Mind & University College London Razvan Pascanu Deep Mind Peter E. Latham Gatsby Unit, UCL Yee Whye Teh Deep Mind
Pseudocode	Yes	Algorithm 1: Efﬁcient Pack Net (EPN) + Powerpropagation.
Open Source Code	Yes	We provide code to reproduce the MNIST results (a) in the accompanying notebook. 3https://github.com/deepmind/deepmind-research/tree/master/powerpropagation
Open Datasets	Yes	Figure 1a shows the effect of increasing sparsity on the layerwise magnitude-pruning setting for Le Net [40] on MNIST [41]. and Figure 2 shows this comparison for image classiﬁcation on the popular CIFAR-10 [67] and Image Net [68] datasets
Dataset Splits	Yes	terminating the search once the sparse model s performance falls short of a minimum accepted target performance γP (computed on a held-out validation set) and Ps E(XT , y T , φ Mt) // Validation performance of sparse model (from Algorithm 1, Line 9).
Hardware Specification	No	The paper discusses computational costs and efficiency (reducing the computational footprint of models), but does not specify the exact hardware (e.g., GPU/CPU models, types of accelerators) used for running the experiments.
Software Dependencies	No	The paper mentions using Adam [33] as an optimizer, but does not provide specific version numbers for programming languages, libraries, frameworks (like TensorFlow or PyTorch), or other software components used in the experiments.
Experiment Setup	Yes	Finally is worth noting that the choice of α does inﬂuence the optimal learning rate schedule and best results were obtained after changes to the default schedule. and training for 1M steps with Adam [33] (relying on the formulation in Section 2) on each task while allowing 100k retrain steps for Pack Net. Also, Algorithm 1 specifies Target performance γ [0, 1]; Sparsity rates S = [s1, . . . , sn].