Online Continual Learning through Mutual Information Maximization

Authors: Yiduo Guo, Bing Liu, Dongyan Zhao

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation shows that OCM substantially outperforms the online CL baselines. For example, for CIFAR10, OCM improves the accuracy of the best baseline by 13.1% from 64.1% (baseline) to 77.2% (OCM). ... Empirical evaluation using benchmark datasets MNIST, CIFAR10, CIFAR100 and Tiny Image Net shows that OCM outperforms the state-of-the-art online CL systems markedly.
Researcher Affiliation Academia 1Wangxuan Institute of Computer Technology, Peking University. 2Artificial Intelligence Institute, Peking University. 3Department of Computer Science, University of Illinois at Chicago.
Pseudocode Yes The pseudo-code of our algorithm is given in Algorithms 1 and 2 in Appendix 3.
Open Source Code Yes The code is publicly available at https://github.com/gydpku/OCM.
Open Datasets Yes Evaluation data. We use 4 image classification datasets. MNIST (Le Cun et al., 1998) has 10 classes with 60,000 examples for training and 10,000 examples for testing. It is split into 5 disjoint tasks with 2 classes per task. CIFAR10 (Krizhevsky & Hinton, 2009) has 10 classes with 50,000 for training and 10,000 for testing. CIFAR100 (Krizhevsky & Hinton, 2009), which has 100 classes with 50,000 for training and 10,000 for testing. Tiny Image Net (Le & Yang, 2015) has 200 classes.
Dataset Splits No The paper specifies training and testing data sizes but does not explicitly mention a separate validation set split or percentage.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as CPU/GPU models or memory.
Software Dependencies No The paper mentions using
Experiment Setup Yes For all datasets, OCM is trained with the Adam optimizer. We set the learning rate as 0.001 and fix the weight decay as 0.0001. Following (Shim et al., 2021), we set each data increment size N to 10 (the size of Xnew) for all systems. For the memory buffer batch (Xbuf) size Nb, in OCM, we initialize Nb as zero and increase it by seven slots when the system meets a new class. We set the max Nb allowed as 64. ... We set λ as 0.5 (Eq. 9 in main paper). ... We set α as 1 and β as 2 for (2) and α to 0 and β still as 2 for (3).