Incremental Learning of Structured Memory via Closed-Loop Transcription
Authors: Shengbang Tong, Xili Dai, Ziyang Wu, Mingyang Li, Brent Yi, Yi Ma
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method can effectively alleviate catastrophic forgetting, achieving significantly better performance than prior work of generative replay on MNIST, CIFAR-10, and Image Net-50, despite requiring fewer resources. |
| Researcher Affiliation | Academia | Shengbang Tong1, Xili Dai2, Ziyang Wu1, Mingyang Li3, Brent Yi1, Yi Ma1, 3 1 University of California, Berkeley, 2 The Hong Kong University of Science and Technology(Guangzhou) 3 Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University |
| Pseudocode | Yes | A ALGORITHM OUTLINE ... Algorithm 1 FORMING MEMORY MEAN AND COVARIANCE(Zt, k, r) ... Algorithm 2 MEMORY SAMPLING(M1, . . ., Mt, k, r, C) ... Algorithm 3 i-CTRL |
| Open Source Code | No | We will also make our source code available upon request by the reviewers or the area chairs. |
| Open Datasets | Yes | We conduct experiments on the following datasets: MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky et al., 2014), and Image Net-50 (Deng et al., 2009). |
| Dataset Splits | Yes | For both MNIST and CIFAR-10, the 10 classes are split into 5 tasks with 2 classes each or 10 tasks with 1 class each; for Image Net-50, the 50 classes are split into 5 tasks of 10 classes each. For MNIST and CIFAR-10 experiments, for the encoder f and decoder g, we adopt a very simple network architecture modified from DCGAN (Radford et al., 2016), which is merely a four-layer convolutional network. |
| Hardware Specification | Yes | All experiments are conducted with 1 or 2 RTX 3090 GPUs. |
| Software Dependencies | No | For all experiments, we use Adam (Kingma & Ba, 2014) as our optimizer, with hyperparameters β1 = 0.5, β2 = 0.999. Learning rate is set to be 0.0001. We choose ϵ2 = 1.0, γ = 1, and λ = 10 for both equation (8) and (9) in all experiments. |
| Experiment Setup | Yes | For all experiments, we use Adam (Kingma & Ba, 2014) as our optimizer, with hyperparameters β1 = 0.5, β2 = 0.999. Learning rate is set to be 0.0001. We choose ϵ2 = 1.0, γ = 1, and λ = 10 for both equation (8) and (9) in all experiments. For MNIST, CIFAR-10 and CIFAR-100, each task is trained for 120 epochs; For Image Net-50, the first task D1 is trained for 500 epochs with constraint on augmentation used in (Chen et al., 2020) and 150 epochs for rest incremental 4 tasks using the normal i-CTRL objective 7. Prototype settings: For MNIST, we choose r = 6, k = 10. For CIFAR-10, we choose r = 12, k = 20. For Image Net-50, we us r = 10, k = 15. |