SWAP: Sparse Entropic Wasserstein Regression for Robust Network Pruning

Authors: Lei You, Hei Victor Cheng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments performed on various networks and datasets show comparable performance of SWAP with state-of-the-art (So TA) network pruning algorithms. Our proposed method outperforms the So TA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for Mobile Net V1 with less than one-fourth of the network parameters remaining.
Researcher Affiliation Academia Lei You Department of Engineering Technology Technical University of Denmark Ballerup, DK-2750, Denmark leiyo@dtu.dk Hei Victor Cheng Department of Electrical and Computer Engineering Aarhus Universrity Aarhus, DK-8200, Denmark hvc@ece.au.dk
Pseudocode Yes Algorithm 1 Sparse Entropic WAsserstein Regression Pruning (SWAP)
Open Source Code Yes The code is available on https://github.com/youlei202/Entropic-Wasserstein-Pruning
Open Datasets Yes MLPNet on MNIST (Le Cun et al., 1998), Res Net20 (200K parameters) and Res Net50 (25M parameters) (He et al., 2016) trained on CIFAR10 (Krizhevsky et al., 2009), and Mobile Net V1 (Howard et al., 2017) (4.2M parameters) trained on Image Net (Deng et al., 2009).
Dataset Splits No No explicit mention of training/validation/test splits by percentage or count was found. The paper references datasets but not their specific splits for reproduction.
Hardware Specification Yes The models MLPNet, Res Net20, and Mobile Net V1 underwent a pre-training phase of 100 epochs utilizing 4 NVIDIA Tesla V100 32 GB GPUs connected with NVlink. ... For the pruning process, we either utilized 2 NVIDIA Tesla V100 32 GB GPUs with NVlink or a single Tesla A100 PCIE (available in 40 or 80 GB configurations).
Software Dependencies No No specific software dependencies with version numbers were mentioned.
Experiment Setup Yes In Table 1, we set the pruning stage of LR and EWR to be 15 for MLPNet and Res Net20 and 10 for Mobile Net V1. The sparsity k1, k2, . . . k T in Algorithm 1 is arranged following an exponential gradual pruning schedule kt k T pk0 k T q ˆ 1 t with the initial sparsity k0 set to zero. The fisher sample size setup follows (Chen et al., 2022, Table 2), shown as Table 4 of this paper below. ... Throughout the paper, we set λ in the optimization problem (6) to 0.01. The regularization multiplier ε is set to 1 unless specified otherwise.