reproducibilityindex.ai

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Authors: Mao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, Qiang Liu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Practically, we improve prior arts of network pruning on learning compact neural architectures on Image Net, including Res Net, Mobilenet V2/V3, and Proxyless Net. Our theory and empirical results on Mobile Net suggest that we should ﬁne-tune the pruned subnetworks to leverage the information from the large model...
Researcher Affiliation	Collaboration	1Department of Computer Science, the University of Texas, Austin 2Department of Statistics, the University of Chicago 3Google Research.
Pseudocode	Yes	Algorithm 1 Layer-wise Greedy Subnetwork Selection
Open Source Code	Yes	Code is available at https://github.com/lushleaf/Network-Pruning-Greedy-Forward-Selection.
Open Datasets	Yes	We use ILSVRC2012, a subset of Image Net (Deng et al., 2009) which consists of about 1.28 million training images and 50,000 validation images with 1,000 different classes.
Dataset Splits	Yes	We use ILSVRC2012, a subset of Image Net (Deng et al., 2009) which consists of about 1.28 million training images and 50,000 validation images with 1,000 different classes.
Hardware Specification	No	The paper mentions training 'on 4 GPUs' but does not specify the make or model of the GPUs or any other hardware components.
Software Dependencies	No	The paper describes optimization algorithms and schedules (e.g., 'SGD optimizer with Nesterov momentum 0.9', 'cosine schedule') but does not specify software dependencies with version numbers (e.g., PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	We use the standard SGD optimizer with Nesterov momentum 0.9 and weight decay 5 10^-5. For Res Net, we use a ﬁxed learning rate 2.5 10^-4. For the other architectures, following the original settings (Cai et al., 2019; Sandler et al., 2018), we decay learning rate using cosine schedule (Loshchilov & Hutter, 2017) starting from 0.01. We ﬁnetune subnetwork for 150 epochs with batch size 512 on 4 GPUs. We resize images to 224x224 resolution and adopt the standard data augmentation scheme (mirroring and shifting).