Sparse Winning Tickets are Data-Efficient Image Recognizers

Authors: Mukund Varma T, Xuxi Chen, Zhenyu Zhang, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we empirically show that winning tickets (small subnetworks) obtained via magnitude pruning based on the lottery ticket hypothesis [1], apart from being sparse are also effective recognizers in data-limited regimes. Based on extensive experiments, we find that in low data regimes (datasets of 50-100 examples per class), sparse winning tickets substantially outperform the original dense networks.
Researcher Affiliation Collaboration Mukund Varma T1, Xuxi Chen2, Zhenyu Zhang2, Tianlong Chen2, Subhashini Venugopalan3, Zhangyang Wang2 1Indian Institute of Technology Madras, 2University of Texas at Austin, 3Google
Pseudocode No The paper describes the IMP procedure with numbered steps but does not format it as a pseudocode block or algorithm labeled as such.
Open Source Code Yes Code is available at https://github.com/VITA-Group/Data Efficient LTH.
Open Datasets Yes CIFAR10 [34] is our primary dataset for all analysis experiments. CIFAR10-C [35]. CIFAR10.2 [36]. CLa MM [38], ISIC [39] and Euro SAT [40]. CIFAR100 [34]
Dataset Splits Yes CIFAR10 [34] is our primary dataset for all analysis experiments. It consists of 60,000 colour images (train: 50,000, val: 10,000) of size 32x32 split into 10 classes. ... We aways evaluate on the full validation set in all our experiments.
Hardware Specification No The paper mentions running experiments, but it does not specify any particular hardware components such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using SGD optimizer and cross entropy loss, and refers to PyTorch implicitly through common deep learning frameworks, but it does not specify software versions for any libraries, frameworks, or programming languages used.
Experiment Setup Yes We use the Res Net model (specifically Res Net-18 unless otherwise specified) in all our experiments. Since we are working with smaller-sized images (often 32x32 or 64x64), we modify the initial convolution layer to a 3x3 kernel, with padding 1 with no max-pooling. For images of size 224 we do not make any modifications. We follow the IMP lottery ticket-finding procedure with 16 pruning iterations where 20% of the weights are pruned after each iteration. In each pruning iteration, the model is trained for 200 epochs to minimize the cross entropy loss using the SGD optimizer with weight decay 0.0005. The initial learning rate is set to 0.1 and then cosine decayed for 200 epochs. For fine-tuning experiments, we use a lower learning rate of 0.001. During the first iteration, we use the rewinding technique and set r = 2 epochs.