Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning

Authors: Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, Zhangyang Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For validating our proposal, we conduct extensive experiments on CIFAR-10, CIFAR-100, and Tiny Image Net datasets for class-incremental learning (Rebuffi et al., 2017). The results demonstrate the existence and the high competitiveness of lifelong tickets. Our best lifelong tickets (found by bottom-up pruning and lottery teaching) achieve comparable or better performance across all sequential tasks, with as few as 3.64% parameters, compared to state-of-the-art dense models.
Researcher Affiliation Collaboration Tianlong Chen1*, Zhenyu Zhang2*, Sijia Liu3,4, Shiyu Chang4, Zhangyang Wang1 1University of Texas at Austin,2University of Science and Technology of China 3Michigan State University, 4MIT-IBM Watson AI Lab, IBM Research
Pseudocode Yes Algorithm 1: Top-Down Pruning Algorithm 2: Bottom-Up Pruning
Open Source Code Yes Codes available at https://github.com/VITA-Group/Lifelong-Learning-LTH.
Open Datasets Yes We evaluate our proposed lifelong tickets on three datasets: CIFAR-10, CIFAR-100, and Tiny-Image Net. ... All queried unlabeled data for CIFAR-10/CIFAR-100 are from 80 Million Tiny Image dataset (Torralba et al., 2008), and for Tiny-Image Net are from Image Net dataset (Krizhevsky et al., 2012).
Dataset Splits Yes For all three datasets, we randomly split the original training dataset into training and validation with a ratio of 9 : 1.
Hardware Specification Yes All of our experiments are conducted on NVIDIA GTX 1080-Ti GPUs.
Software Dependencies No The paper mentions using ResNet18 as a backbone and Stochastic Gradient Descent (SGD) for training, but it does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x, or specific library versions) that would be necessary for reproducibility.
Experiment Setup Yes Models are trained using Stochastic Gradient Descent (SGD) with 0.9 momentum and 5e-4 weight decay. For 100 epochs training, a multi-step learning rate schedule is conducted, starting from 0.01, then decayed by 10 times at epochs 60 and 80. During the iterative pruning, we retrain the model for 30 epochs using a fixed learning rate of 10e-4. The batch size for both labeled and unlabeled data is 128.