Online Continual Learning through Mutual Information Maximization
Authors: Yiduo Guo, Bing Liu, Dongyan Zhao
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that OCM substantially outperforms the online CL baselines. For example, for CIFAR10, OCM improves the accuracy of the best baseline by 13.1% from 64.1% (baseline) to 77.2% (OCM). ... Empirical evaluation using benchmark datasets MNIST, CIFAR10, CIFAR100 and Tiny Image Net shows that OCM outperforms the state-of-the-art online CL systems markedly. |
| Researcher Affiliation | Academia | 1Wangxuan Institute of Computer Technology, Peking University. 2Artificial Intelligence Institute, Peking University. 3Department of Computer Science, University of Illinois at Chicago. |
| Pseudocode | Yes | The pseudo-code of our algorithm is given in Algorithms 1 and 2 in Appendix 3. |
| Open Source Code | Yes | The code is publicly available at https://github.com/gydpku/OCM. |
| Open Datasets | Yes | Evaluation data. We use 4 image classification datasets. MNIST (Le Cun et al., 1998) has 10 classes with 60,000 examples for training and 10,000 examples for testing. It is split into 5 disjoint tasks with 2 classes per task. CIFAR10 (Krizhevsky & Hinton, 2009) has 10 classes with 50,000 for training and 10,000 for testing. CIFAR100 (Krizhevsky & Hinton, 2009), which has 100 classes with 50,000 for training and 10,000 for testing. Tiny Image Net (Le & Yang, 2015) has 200 classes. |
| Dataset Splits | No | The paper specifies training and testing data sizes but does not explicitly mention a separate validation set split or percentage. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | For all datasets, OCM is trained with the Adam optimizer. We set the learning rate as 0.001 and fix the weight decay as 0.0001. Following (Shim et al., 2021), we set each data increment size N to 10 (the size of Xnew) for all systems. For the memory buffer batch (Xbuf) size Nb, in OCM, we initialize Nb as zero and increase it by seven slots when the system meets a new class. We set the max Nb allowed as 64. ... We set λ as 0.5 (Eq. 9 in main paper). ... We set α as 1 and β as 2 for (2) and α to 0 and β still as 2 for (3). |