Sparse Double Descent: Where Network Pruning Aggravates Overfitting

Authors: Zheng He, Zeke Xie, Quanzhi Zhu, Zengchang Qin

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we report the novel sparse double descent phenomenon through extensive experiments. In this section, we conducted extensive experiments to demonstrate sparse double descent with respect to model sparsity.
Researcher Affiliation Academia 1Intelligent Computing and Machine Learning Lab, School of ASEE, Beihang University, Beijing, China 2The University of Tokyo 3RIKEN Center for AIP.
Pseudocode No The paper describes its methods and processes in narrative text and refers to figures, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code Yes Our code is available in https://github.com/hezheug/sparse-double-descent.
Open Datasets Yes We train a fully-connected Le Net300-100 (Le Cun et al., 1998) on MNIST (Le Cun et al., 1998), a Res Net-18 (He et al., 2016) on CIFAR-10 or CIFAR-100 (Krizhevsky et al., 2009). We also test a VGG-16 (Simonyan & Zisserman, 2015) on CIFAR datasets (see Figure 20 and 27). In the context of larger model and dataset, we train a Res Net-101 on Tiny Image Net dataset1 (see Figure 30), which is a reduced version of Image Net (Deng et al., 2009). 1This dataset is from the Tiny Image Net Challenge: https://tinyimagenet.herokuapp.com/
Dataset Splits No The paper consistently refers to 'train' and 'test' accuracy and performance, but it does not explicitly mention or detail a 'validation' dataset split or provide specific percentages for how the data was partitioned into train, validation, and test sets. It describes iterative pruning and retraining processes, but this is distinct from a dataset split.
Hardware Specification No The paper mentions 'We run all the MNIST and CIFAR experiments on single GPU and Tiny Image Net experiments on four GPUs with CUDA 10.1.' However, it does not specify the particular models of the GPUs (e.g., NVIDIA A100, RTX 2080 Ti) or provide details about the CPUs or other hardware components used.
Software Dependencies No The paper states 'We adopt standard implementations of Le Net-300-100 from Open LTH2' and mentions using a 'modified version of Py Torch model'. While this implies the use of PyTorch, it does not provide specific version numbers for PyTorch or any other software libraries, frameworks, or operating systems used, which is necessary for reproducibility.
Experiment Setup Yes We describe the main experimental setup used throughout this paper. Particularly, we also vary several experimental choices, e.g., models and datasets, pruning strategies, retraining methods and label noise settings, to verify the generalizability of the sparse double descent phenomenon. Preliminaries, experimental details as well as more experimental results are given in the Appendix. Models and datasets. ... Network pruning. ... Pruning strategies. ... Retraining. ... Label noise settings. ... The training hyperparameters used in our experiments is given as follows. Network Dataset Epochs Batch Opt. Mom. LR LR Drop Drop Factor Weight Decay Rewind Iter LR(finetune) LR(re-dense) Le Net-300-100 MNIST 200 128 SGD 0.1 0 0.1 0.1 Res Net-18 CIFAR-10 160 128 SGD 0.9 0.1 80, 120 0.1 1e-4 1000 0.001 0.001 Res Net-18 CIFAR-100 160 128 SGD 0.9 0.1 80, 120 0.1 1e-4 1000 0.001 0.001 VGG-16 CIFAR-10 160 128 SGD 0.9 0.1 80, 120 0.1 1e-4 2000 0.001 0.001 VGG-16 CIFAR-100 160 128 SGD 0.9 0.1 80, 120 0.1 1e-4 2000 0.001 0.001 Res Net-101 Tiny Image Net 200 512 SGD 0.9 0.2 100, 150 0.1 1e-4 1000