Defying Imbalanced Forgetting in Class Incremental Learning
Authors: Shixiong Xu, Gaofeng Meng, Xing Nie, Bolin Ni, Bin Fan, Shiming Xiang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that CLAD consistently improves current replay-based methods, resulting in performance gains of up to 2.56%. Experiments and statistical analysis are conducted to demonstrate that imbalanced forgetting results from varying semantic similarity between inter-task classes. Extensive experiments on CIFAR-100 (Krizhevsky et al. 2009) and Image Net (Deng et al. 2009) indicate that CLAD provides a consistent and impressive performance improvement over existing methods. Besides, comprehensive ablation studies are performed to show how the components in CLAD, including the conflict classes selection, regularization coefficient, and buffer size, influence its performance. |
| Researcher Affiliation | Academia | Shixiong Xu1,2, Gaofeng Meng1,2,3*, Xing Nie1,2, Bolin Ni1,2, Bin Fan4, Shiming Xiang1,2 1State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Centre for Artificial Intelligence and Robotics, HK Institute of Science & Innovation, Chinese Academy of Sciences 4School of Intelligence Science and Technology, University of Science and Technology, Beijing |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. The methods are described using mathematical formulas and descriptive text. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a specific repository link or an explicit statement of code release) for the source code of the methodology described in the paper. |
| Open Datasets | Yes | Three commonly used benchmarks (Hou et al. 2019) are selected to evaluate the proposed method. CIFAR-100 (Krizhevsky et al. 2009) consists of 600,000 images from 100 classes, and the image size is 32 32. Image Net (Deng et al. 2009) contains about 1.2 million 224 224 RGB images from 1000 classes. Image Net100 (Rebuffi et al. 2017) is a subset of Image Net (Deng et al. 2009), which is sampled as (Hou et al. 2019; Liu, Schiele, and Sun 2021). |
| Dataset Splits | Yes | To be consistent with the protocols of the previous work (Hou et al. 2019; Rebuffi et al. 2017; Liu, Schiele, and Sun 2021; Liu et al. 2020; Hu et al. 2021), all the classes of each dataset are shuffled with seed 1993 before splitting them into tasks. For CIFAR-100 and Image Net100, half classes are selected for the first task to mimic the pre-collected dataset in real-world (Hou et al. 2019), then there are S = 10/5/2 classes for each latter task. For Image Net, 100 classes are selected for the first task, then the model learns 100 or 50 classes per task incrementally. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It only mentions using ResNet-18 as the model architecture. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). It mentions SGD as the optimizer and ResNet-18 as the backbone. |
| Experiment Setup | Yes | Implementation details. Following the previous studies (Shi et al. 2022; Yan, Xie, and He 2021), we adopt Res Net-18 (He et al. 2016) for all the experiments bellow. Notably, for CIFAR-100 (Krizhevsky et al. 2009) the kernel size of the first convolution layer is set to 3 3, and the following maxpooling layer is removed for higher feature resolution (Shi et al. 2022). And SGD is used as the optimizer. The learning rate is set to 0.1, the batch size is set to 128, the momentum is set to 0.9, and the weight decay is 5e-4. For CIFAR-100, all the methods are trained for 160 epochs for each task, and the learning rate is multiplied by 0.1 at the 80-th and 120-th epoch. For Image Net and Image Net100, the models are trained for 90 epochs for each task, and the learning rate is multiplied by 0.1 as the 30-th and 60-th epoch. ... The conflict proportion is set to 0.1 for all experiments empirically. The CLAD coefficient is set to 4 for LUCIR and Cw D, while the value of it is 2 for PODNet. |