Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning
Authors: Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, Zhangyang Wang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For validating our proposal, we conduct extensive experiments on CIFAR-10, CIFAR-100, and Tiny Image Net datasets for class-incremental learning (Rebuffi et al., 2017). The results demonstrate the existence and the high competitiveness of lifelong tickets. Our best lifelong tickets (found by bottom-up pruning and lottery teaching) achieve comparable or better performance across all sequential tasks, with as few as 3.64% parameters, compared to state-of-the-art dense models. |
| Researcher Affiliation | Collaboration | Tianlong Chen1*, Zhenyu Zhang2*, Sijia Liu3,4, Shiyu Chang4, Zhangyang Wang1 1University of Texas at Austin,2University of Science and Technology of China 3Michigan State University, 4MIT-IBM Watson AI Lab, IBM Research |
| Pseudocode | Yes | Algorithm 1: Top-Down Pruning Algorithm 2: Bottom-Up Pruning |
| Open Source Code | Yes | Codes available at https://github.com/VITA-Group/Lifelong-Learning-LTH. |
| Open Datasets | Yes | We evaluate our proposed lifelong tickets on three datasets: CIFAR-10, CIFAR-100, and Tiny-Image Net. ... All queried unlabeled data for CIFAR-10/CIFAR-100 are from 80 Million Tiny Image dataset (Torralba et al., 2008), and for Tiny-Image Net are from Image Net dataset (Krizhevsky et al., 2012). |
| Dataset Splits | Yes | For all three datasets, we randomly split the original training dataset into training and validation with a ratio of 9 : 1. |
| Hardware Specification | Yes | All of our experiments are conducted on NVIDIA GTX 1080-Ti GPUs. |
| Software Dependencies | No | The paper mentions using ResNet18 as a backbone and Stochastic Gradient Descent (SGD) for training, but it does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x, or specific library versions) that would be necessary for reproducibility. |
| Experiment Setup | Yes | Models are trained using Stochastic Gradient Descent (SGD) with 0.9 momentum and 5e-4 weight decay. For 100 epochs training, a multi-step learning rate schedule is conducted, starting from 0.01, then decayed by 10 times at epochs 60 and 80. During the iterative pruning, we retrain the model for 30 epochs using a fixed learning rate of 10e-4. The batch size for both labeled and unlabeled data is 128. |