SWAP: Sparse Entropic Wasserstein Regression for Robust Network Pruning
Authors: Lei You, Hei Victor Cheng
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments performed on various networks and datasets show comparable performance of SWAP with state-of-the-art (So TA) network pruning algorithms. Our proposed method outperforms the So TA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for Mobile Net V1 with less than one-fourth of the network parameters remaining. |
| Researcher Affiliation | Academia | Lei You Department of Engineering Technology Technical University of Denmark Ballerup, DK-2750, Denmark leiyo@dtu.dk Hei Victor Cheng Department of Electrical and Computer Engineering Aarhus Universrity Aarhus, DK-8200, Denmark hvc@ece.au.dk |
| Pseudocode | Yes | Algorithm 1 Sparse Entropic WAsserstein Regression Pruning (SWAP) |
| Open Source Code | Yes | The code is available on https://github.com/youlei202/Entropic-Wasserstein-Pruning |
| Open Datasets | Yes | MLPNet on MNIST (Le Cun et al., 1998), Res Net20 (200K parameters) and Res Net50 (25M parameters) (He et al., 2016) trained on CIFAR10 (Krizhevsky et al., 2009), and Mobile Net V1 (Howard et al., 2017) (4.2M parameters) trained on Image Net (Deng et al., 2009). |
| Dataset Splits | No | No explicit mention of training/validation/test splits by percentage or count was found. The paper references datasets but not their specific splits for reproduction. |
| Hardware Specification | Yes | The models MLPNet, Res Net20, and Mobile Net V1 underwent a pre-training phase of 100 epochs utilizing 4 NVIDIA Tesla V100 32 GB GPUs connected with NVlink. ... For the pruning process, we either utilized 2 NVIDIA Tesla V100 32 GB GPUs with NVlink or a single Tesla A100 PCIE (available in 40 or 80 GB configurations). |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. |
| Experiment Setup | Yes | In Table 1, we set the pruning stage of LR and EWR to be 15 for MLPNet and Res Net20 and 10 for Mobile Net V1. The sparsity k1, k2, . . . k T in Algorithm 1 is arranged following an exponential gradual pruning schedule kt k T pk0 k T q ˆ 1 t with the initial sparsity k0 set to zero. The fisher sample size setup follows (Chen et al., 2022, Table 2), shown as Table 4 of this paper below. ... Throughout the paper, we set λ in the optimization problem (6) to 0.01. The regularization multiplier ε is set to 1 unless specified otherwise. |