Provable Contrastive Continual Learning

Authors: Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Overall, our contributions are listed as follows. (1) We provide theoretical performance guarantees for the contrastive continual learning scheme. We identify that the overall performance of the final learned model on all seen tasks can be bounded by a function of the series of training losses with the distillation coefficient; (2) We propose an efficient algorithm CILA, which uses adaptive distillation coefficient λt (replace λ with λt in Figure 1) for each task t; (3) We conduct extensive experiments to validate the efficacy of our algorithm, and the results strongly support our theory.
Researcher Affiliation Academia 1MIFA Lab, Qing Yuan Research Institute, SEIEE, Shanghai Jiao Tong University 2Beijing Normal University 3Department of Mathematical Sciences, Tsinghua University 4Shanghai AI Laboratory.
Pseudocode Yes Algorithm 1 CILA: Contrastive Incremental Learning with Adaptive distillation
Open Source Code No The paper does not provide a specific link or explicit statement about the availability of its source code.
Open Datasets Yes Specifically, for Class-IL and Task-IL, we utilized Seq-CIFAR-10 and Seq-Tiny-Image Net datasets. Seq-CIFAR-10 is a modified version of the CIFAR-10 (Krizhevsky, 2009) dataset... Similarly, Seq-Tiny-Image Net is an adapted version of the Tiny-Image Net (Le & Yang, 2015) dataset... For Domain-IL, we employed R-MINST, which is a variant of the MNIST (Lecun et al., 1998) dataset.
Dataset Splits No The paper does not explicitly specify the training, validation, and test splits (e.g., percentages or sample counts) for the datasets used in its experiments.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes In our training process, we employed buffers of sizes 200 and 500. The base distillation coefficient λ0 is set as one following the default configuration of Co2L (Cha et al., 2021). For all experiments, a linear classifier is trained for a fixed number of epochs and we adopt 100 epochs to align with prior work.