Effective Continual Learning for Text Classification with Lightweight Snapshots
Authors: Jue Wang, Dajie Dong, Lidan Shou, Ke Chen, Gang Chen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments over various task sequences show that our approach effectively mitigates catastrophic forgetting and outperforms all baselines. We conduct extensive experiments over several task sequences, including cross-dataset and cross-language task sequences. The results show that our approach can effectively mitigate catastrophic forgetting under different task settings without using any old training data. |
| Researcher Affiliation | Academia | 1Key Lab of Intelligent Computing Based Big Data of Zhejiang Province, Zhejiang University 2College of Computer Science and Technology, Zhejiang University |
| Pseudocode | Yes | Algorithm 1: Overall training procedure. |
| Open Source Code | Yes | Code available at: https://github.com/LorrinWWW/Snapshot. |
| Open Datasets | Yes | THUCNews dataset (Sun et al. 2016), AG’s news corpus (Zhang, Zhao, and Le Cun 2015), Yelp reviews (Asghar 2016), Amazon reviews (Mc Auley and Leskovec 2013), and DBPedia dataset (Zhang, Zhao, and Le Cun 2015). ... SNIPS benchmark (Coucke et al. 2018) |
| Dataset Splits | No | The paper uses standard datasets but does not explicitly provide the specific percentages or counts for training, validation, and test splits used in their experiments. It mentions 'tuning hyperparameters by performing a grid search' which implies a validation set, but the split method is not detailed. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used to run the experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and fine-tuning a BERT model, but it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We use the Adam optimizer, and set the learning rate to 2e-5 for the global model and 1e-4 for the adapter-based snapshot. We set the training batch size to 32. We tune the hyperparameters by performing a grid search over d [12, 96], T [1, 6], and M [1, 4]. Our approach works well with a bottleneck size d = 48, temperature T = 3, and maximum number of snapshots for each training step M = 3. We train each task for one epoch and report the average results over 3 runs with different random seeds. |