Representational Continuity for Unsupervised Continual Learning
Authors: Divyam Madaan, Jaehong Yoon, Yuanchun Li, Yunxin Liu, Sung Ju Hwang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-ofdistribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (LUMP), a simple yet effective technique that interpolates between the current task and previous tasks instances to alleviate catastrophic forgetting for unsupervised representations. We release our code online. Table 1 shows the evaluation results for supervised and unsupervised representations learnt by Sim Siam (Chen & He, 2021) across various continual learning strategies. |
| Researcher Affiliation | Collaboration | Divyam Madaan1 Jaehong Yoon2,3 Yuanchun Li5,6 Yunxin Liu5,6 Sung Ju Hwang2,4 New York University1 KAIST2 Microsoft Research3 AITRICS4 Institute for AI Industry Research (AIR)5 Tsinghua University6 divyam.madaan@nyu.edu, {jaehong.yoon,sjhwang82}@kaist.ac.kr liyuanchun@air.tsinghua.edu.cn, liuyunxin@air.tsinghua.edu.cn |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. It provides mathematical formulations and descriptions of methods. |
| Open Source Code | Yes | We release our code online. |
| Open Datasets | Yes | Split CIFAR-10 (Krizhevsky, 2012) consists of two random classes out of the ten classes for each task. Split CIFAR-100 (Krizhevsky, 2012) consists of five random classes out of the 100 classes for each task. Split Tiny-Image Net is a variant of the Image Net dataset (Deng et al., 2009) containing five random classes out of the 100 classes for each task with the images sized 64 64 pixels. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or counts for training, validation, and testing splits. It mentions training on a sequence of tasks and evaluation with KNN classifier. |
| Hardware Specification | No | The paper mentions using a 'single-head Res Net-18' architecture but does not specify any particular hardware components like CPU or GPU models used for experiments. |
| Software Dependencies | No | The paper mentions using the 'DER (Buzzega et al., 2020) open-source codebase' and referring to the original implementations for 'Sim Siam' and 'Barlow Twins' but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We follow the hyperparameter setup of Buzzega et al. (2020) for all the SCL strategies and tune them for the UCL representation learning strategies. All the learned representations are evaluated with KNN classifier (Wu et al., 2018) across three independent runs. Further, we use the hyper-parameters obtained by Sim Siam for training UCL strategies with Barlow Twins to analyze the sensitivity of UCL to hyper-parameters and for a fair comparison between different methods. We train all the UCL methods for 200 epochs and evaluate with the KNN classifier (Wu et al., 2018). We provide the hyper-parameters in detail in Table A.5. |