reproducibilityindex.ai

Layer-adaptive Sparsity for the Magnitude-based Pruning

Authors: Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, Jinwoo Shin

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of LAMP under a diverse experimental setup, encompassing various convolutional neural network architectures (VGG-16, Res Net-18/34, Dense Net-121, Efﬁcient Net-B0) and various image datasets (CIFAR-10/100, SVHN, Restricted Image Net). In all considered setups, LAMP consistently outperforms the baseline layerwise sparsity selection schemes.
Researcher Affiliation	Academia	Jaeho Lee E Sejun Park A Sangwoo Mo E Sungsoo Ahn M Jinwoo ShinÆ EKAIST EE AKAIST AI MMBZUAI {jaeho-lee,sejun.park,swmo,jinwoos}@kaist.ac.kr, peter.ahn@mbzuai.ac.ae
Pseudocode	Yes	The ﬁrst three steps can be easily implemented in Py Torch as follows. def lamp_score(weight): normalizer = weight.norm() 2 sorted_weight, sorted_idx = weight.abs().view(-1).sort(descending=False) weight_cumsum_temp = (sorted_weight 2).cumsum(dim=0) weight_cumsum = torch.zeros(weight_cumsum_temp.shape) weight_cumsum[1:] = weight_cumsum_temp[:len(weight_cumsum_temp) 1] sorted_weight /= (normalizer weight_square_cumsum).sqrt() score = torch.zeros(weight_cumsum.shape) score[sorted_idx] = sorted_weight score = score.view(weight.shape) return score
Open Source Code	Yes	Code: https://github.com/jaeho-lee/layer-adaptive-sparsity
Open Datasets	Yes	Datasets. We consider the following datasets; CIFAR-10/100 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011), and Restricted Image Net (Tsipras et al., 2019).
Dataset Splits	No	CIFAR-10/100 dataset is augmented with random crops with a padding of 4 and random horizontal ﬂips. We normalize both training and test datasets with constants (0.4914, 0.4822, 0.4465), (0.237, 0.243, 0.261). The paper mentions "training" and "test" datasets but does not provide specific split percentages, counts, or explicit cross-validation details for reproducibility.
Hardware Specification	No	The paper mentions 'Sparse GPU kernels for deep learning' in reference to another work (Gale et al., 2020), but does not specify the hardware (e.g., GPU models, CPU types) used for its own experiments.
Software Dependencies	No	With an exception of the weight rewinding experiment, we use Adam W (Loshchilov & Hutter, 2019) with learning rate 0.0003; we use vanilla Adam with learning rate 0.0003 for the weight rewinding experiment, following the setup of Frankle & Carbin (2019). For other hyperparameters, we follow the Py Torch default setup: β = (0.9, 0.999), wd = 0.01, ε = 10 8. The paper mentions software components like PyTorch and Adam W but does not provide specific version numbers for any of them.
Experiment Setup	Yes	A EXPERIMENTAL SETUPS For any implementational details not given in this section, we refer to the code at: https://github.com/jaeho-lee/layer-adaptive-sparsity Optimizer. With an exception of the weight rewinding experiment, we use Adam W (Loshchilov & Hutter, 2019) with learning rate 0.0003; we use vanilla Adam with learning rate 0.0003 for the weight rewinding experiment, following the setup of Frankle & Carbin (2019). For other hyperparameters, we follow the Py Torch default setup: β = (0.9, 0.999), wd = 0.01, ε = 10 8. Pre-processing. CIFAR-10/100 dataset is augmented with random crops with a padding of 4 and random horizontal ﬂips. We normalize both training and test datasets with constants (0.4914, 0.4822, 0.4465), (0.237, 0.243, 0.261). ... Table 1: Optimization details. Dataset Model Initial training iter. Re-training iter. Batch size SVHN VGG-16 40000 30000 100 CIFAR-10 {VGG-16, Efﬁcient Net-B0} 50000 40000 100 CIFAR-10 Dense Net-121 80000 60000 100 CIFAR-100 VGG-16 60000 50000 100 Restricted Image Net Res Net-34 80000 80000 128 CIFAR-10 Conv-6 (SNIP) 50000 40000 128