Optimizing Mode Connectivity for Class Incremental Learning
Authors: Haitao Wen, Haoyang Cheng, Heqian Qiu, Lanxiao Wang, Lili Pan, Hongliang Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CIFAR-100, Image Net-100, and Image Net-1K show consistent improvements when adapting EOPC to existing representative CIL methods. Our code is available at https://github.com/Haitao Wen/EOPC. In this section, we will adapt EOPC to several existing representative CIL methods in a post-processing manner on several benchmarks for comparisons. |
| Researcher Affiliation | Academia | 1University of Electronic Science and Technology of China. |
| Pseudocode | Yes | A. Algorithm Algorithm 1 Adapting EOPC to CIL methods |
| Open Source Code | Yes | Our code is available at https://github.com/Haitao Wen/EOPC. |
| Open Datasets | Yes | 1) CIFAR-100 contains 100 classes, each class has 500 training samples and 100 testing samples with image size 32 32 (Krizhevsky et al., 2009). 2) Image Net-1K contains 1000 classes, each class has about 1300 training samples and 50 validation samples (Deng et al., 2009). |
| Dataset Splits | Yes | 1) CIFAR-100 contains 100 classes, each class has 500 training samples and 100 testing samples with image size 32 32 (Krizhevsky et al., 2009). 2) Image Net-1K contains 1000 classes, each class has about 1300 training samples and 50 validation samples (Deng et al., 2009). We split these datasets into a sequence of tasks, the first task contains half of the classes, e.g., 50 classes for CIFAR-100, then the rest of the classes are equally assigned to 5, 10, and 25 steps for incremental learning. |
| Hardware Specification | No | We allocate 2 GPUs for the distributed training of Dy Tox and use the distributed memory option. |
| Software Dependencies | Yes | We use Py Torch (Paszke et al., 2019) to reimplement i Ca RL (Rebuffi et al., 2017), LUCIR (Hou et al., 2019), PODNet (Douillard et al., 2020), AANet (Liu et al., 2021), and AFC (Kang et al., 2022) in the same environment for fair comparisons. |
| Experiment Setup | Yes | For the hyperparameters of EOPC, we choose the SGD optimizer with an initial learning rate of 0.1, which is decayed by a factor of 0.1 at 10 and 15 epochs. The path between continual minima is optimized for 20 epochs with a batch size of 128. The maximum order of the Fourier series (i.e., N in Equation (10)) is set to 4, and the radius of the cylinder is chosen from {2, 4, 6}. We select an appropriate λ from {0.75, 0.85, 0.9} for OPC and uniformly sample 10 points in the interval [0.1, 0.95] and take their average loss as the loss of each iteration. Kaiming initialization is used to initialize the new parameter vector zt in OPC. The number of total sampling points in ensembling is set to 10 and the interval τ is set to 0.1. |