Provable Contrastive Continual Learning
Authors: Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Overall, our contributions are listed as follows. (1) We provide theoretical performance guarantees for the contrastive continual learning scheme. We identify that the overall performance of the final learned model on all seen tasks can be bounded by a function of the series of training losses with the distillation coefficient; (2) We propose an efficient algorithm CILA, which uses adaptive distillation coefficient λt (replace λ with λt in Figure 1) for each task t; (3) We conduct extensive experiments to validate the efficacy of our algorithm, and the results strongly support our theory. |
| Researcher Affiliation | Academia | 1MIFA Lab, Qing Yuan Research Institute, SEIEE, Shanghai Jiao Tong University 2Beijing Normal University 3Department of Mathematical Sciences, Tsinghua University 4Shanghai AI Laboratory. |
| Pseudocode | Yes | Algorithm 1 CILA: Contrastive Incremental Learning with Adaptive distillation |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about the availability of its source code. |
| Open Datasets | Yes | Specifically, for Class-IL and Task-IL, we utilized Seq-CIFAR-10 and Seq-Tiny-Image Net datasets. Seq-CIFAR-10 is a modified version of the CIFAR-10 (Krizhevsky, 2009) dataset... Similarly, Seq-Tiny-Image Net is an adapted version of the Tiny-Image Net (Le & Yang, 2015) dataset... For Domain-IL, we employed R-MINST, which is a variant of the MNIST (Lecun et al., 1998) dataset. |
| Dataset Splits | No | The paper does not explicitly specify the training, validation, and test splits (e.g., percentages or sample counts) for the datasets used in its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | In our training process, we employed buffers of sizes 200 and 500. The base distillation coefficient λ0 is set as one following the default configuration of Co2L (Cha et al., 2021). For all experiments, a linear classifier is trained for a fixed number of epochs and we adopt 100 epochs to align with prior work. |