Incremental Learning of Structured Memory via Closed-Loop Transcription

Authors: Shengbang Tong, Xili Dai, Ziyang Wu, Mingyang Li, Brent Yi, Yi Ma

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method can effectively alleviate catastrophic forgetting, achieving significantly better performance than prior work of generative replay on MNIST, CIFAR-10, and Image Net-50, despite requiring fewer resources.
Researcher Affiliation Academia Shengbang Tong1, Xili Dai2, Ziyang Wu1, Mingyang Li3, Brent Yi1, Yi Ma1, 3 1 University of California, Berkeley, 2 The Hong Kong University of Science and Technology(Guangzhou) 3 Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University
Pseudocode Yes A ALGORITHM OUTLINE ... Algorithm 1 FORMING MEMORY MEAN AND COVARIANCE(Zt, k, r) ... Algorithm 2 MEMORY SAMPLING(M1, . . ., Mt, k, r, C) ... Algorithm 3 i-CTRL
Open Source Code No We will also make our source code available upon request by the reviewers or the area chairs.
Open Datasets Yes We conduct experiments on the following datasets: MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky et al., 2014), and Image Net-50 (Deng et al., 2009).
Dataset Splits Yes For both MNIST and CIFAR-10, the 10 classes are split into 5 tasks with 2 classes each or 10 tasks with 1 class each; for Image Net-50, the 50 classes are split into 5 tasks of 10 classes each. For MNIST and CIFAR-10 experiments, for the encoder f and decoder g, we adopt a very simple network architecture modified from DCGAN (Radford et al., 2016), which is merely a four-layer convolutional network.
Hardware Specification Yes All experiments are conducted with 1 or 2 RTX 3090 GPUs.
Software Dependencies No For all experiments, we use Adam (Kingma & Ba, 2014) as our optimizer, with hyperparameters β1 = 0.5, β2 = 0.999. Learning rate is set to be 0.0001. We choose ϵ2 = 1.0, γ = 1, and λ = 10 for both equation (8) and (9) in all experiments.
Experiment Setup Yes For all experiments, we use Adam (Kingma & Ba, 2014) as our optimizer, with hyperparameters β1 = 0.5, β2 = 0.999. Learning rate is set to be 0.0001. We choose ϵ2 = 1.0, γ = 1, and λ = 10 for both equation (8) and (9) in all experiments. For MNIST, CIFAR-10 and CIFAR-100, each task is trained for 120 epochs; For Image Net-50, the first task D1 is trained for 500 epochs with constraint on augmentation used in (Chen et al., 2020) and 150 epochs for rest incremental 4 tasks using the normal i-CTRL objective 7. Prototype settings: For MNIST, we choose r = 6, k = 10. For CIFAR-10, we choose r = 12, k = 20. For Image Net-50, we us r = 10, k = 15.