reproducibilityindex.ai

Picking Winning Tickets Before Training by Preserving Gradient Flow

Authors: Chaoqi Wang, Guodong Zhang, Roger Grosse

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-Image Net and Image Net, using VGGNet and Res Net architectures. Our method can prune 80% of the weights of a VGG-16 network on Image Net at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves signiﬁcantly better performance than the baseline at extreme sparsity levels.
Researcher Affiliation	Academia	Chaoqi Wang, Guodong Zhang, Roger Grosse University of Toronto, Vector Institute {cqwang, gdzhang, rgrosse}@cs.toronto.edu
Pseudocode	Yes	Algorithm 1 Gradient Signal Preservation (Gra SP).; Algorithm 2 Hessian-gradient Product.
Open Source Code	Yes	Our code is made public at: https://github.com/alecwangcq/Gra SP.
Open Datasets	Yes	We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-Image Net and Image Net, using VGGNet and Res Net architectures. ... (Krizhevsky, 2009) ... (Deng et al., 2009)
Dataset Splits	No	The paper describes training parameters (epochs, learning rate, batch size) and mentions
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or cloud instance types.
Software Dependencies	No	The paper mentions using “Pytorch (Paszke et al., 2017) ofﬁcial implementation” but does not specify a version number for PyTorch or any other software libraries or dependencies.
Experiment Setup	Yes	The pruned network is trained with Kaiming initialization (He et al., 2015) using SGD for 160 epochs for CIFAR-10/100, and 300 epochs for Tiny-Image Net, with an initial learning rate of 0.1 and batch size 128. The learning rate is decayed by a factor of 0.1 at 1/2 and 3/4 of the total number of epochs. ... For Image Net, we adopt the Pytorch (Paszke et al., 2017) ofﬁcial implementation, but we used more epochs for training according to Liu et al. (2019). Speciﬁcally, we train the pruned networks with SGD for 150 epochs, and decay the learning rate by a factor of 0.1 every 50 epochs.