Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

Authors: Tianlong Chen, Zhenyu Zhang, Sijia Liu, Yang Zhang, Shiyu Chang, Zhangyang Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively examine various pre-training mechanisms and find that robust pretraining tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts. For example, on downstream CIFAR-10/100 datasets, we identify double-win matching subnetworks with the standard, fast adversarial, and adversarial pre-training from Image Net, at 89.26%/73.79%, 89.26%/79.03%, and 91.41%/83.22% sparsity, respectively.
Researcher Affiliation Collaboration 1Department of Electrical and Computer Engineering, University of Texas at Austin 2Michigan State University 3MIT-IBM Watson AI Lab 4University of California, Santa Barbara.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/VITA-Group/ Double-Win-LTH.
Open Datasets Yes All the models are pre-trained with the classification task on the Image Net source dataset (Krizhevsky et al., 2012). After producing subnetworks from the pre-training task on Image Net by IMP, we implement both standard and adversarial transfer on three downstream datasets: CIFAR10 (Krizhevsky & Hinton, 2009), CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011).
Dataset Splits No The paper mentions using training data (e.g., 10%, 1%) and early stopping, but it does not provide specific details on how the datasets were split into training, validation, and test sets, nor explicit percentages or sample counts for a distinct validation set across all experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions PyTorch via a URL (https://pytorch.org/vision/stable/models.html) and uses an SGD optimizer, but it does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup Yes On CIFAR-10/100, we train the network for 100 epochs with an initial learning rate of 0.1 and decay by ten times at 50, 75th epoch. As for SVHN, we start from 0.01 learning rate and decay by a cosine annealing schedule for 80 epochs. Moreover, an SGD optimizer is adopted with 5 10 4 weight decay and 0.9 momentum. And we use a batch size of 128 for all downstream experiments. For adversarial training, we train the network against ℓ adversary of 10-steps Projected Gradient Descent (PGD-10) with ϵ = 8 255 and α = 2 255.