Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Picking Winning Tickets Before Training by Preserving Gradient Flow
Authors: Chaoqi Wang, Guodong Zhang, Roger Grosse
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-Image Net and Image Net, using VGGNet and Res Net architectures. Our method can prune 80% of the weights of a VGG-16 network on Image Net at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels. |
| Researcher Affiliation | Academia | Chaoqi Wang, Guodong Zhang, Roger Grosse University of Toronto, Vector Institute EMAIL |
| Pseudocode | Yes | Algorithm 1 Gradient Signal Preservation (Gra SP).; Algorithm 2 Hessian-gradient Product. |
| Open Source Code | Yes | Our code is made public at: https://github.com/alecwangcq/Gra SP. |
| Open Datasets | Yes | We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-Image Net and Image Net, using VGGNet and Res Net architectures. ... (Krizhevsky, 2009) ... (Deng et al., 2009) |
| Dataset Splits | No | The paper describes training parameters (epochs, learning rate, batch size) and mentions |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or cloud instance types. |
| Software Dependencies | No | The paper mentions using “Pytorch (Paszke et al., 2017) official implementation” but does not specify a version number for PyTorch or any other software libraries or dependencies. |
| Experiment Setup | Yes | The pruned network is trained with Kaiming initialization (He et al., 2015) using SGD for 160 epochs for CIFAR-10/100, and 300 epochs for Tiny-Image Net, with an initial learning rate of 0.1 and batch size 128. The learning rate is decayed by a factor of 0.1 at 1/2 and 3/4 of the total number of epochs. ... For Image Net, we adopt the Pytorch (Paszke et al., 2017) official implementation, but we used more epochs for training according to Liu et al. (2019). Specifically, we train the pruned networks with SGD for 150 epochs, and decay the learning rate by a factor of 0.1 every 50 epochs. |