Make Continual Learning Stronger via C-Flat

Authors: Ang Bian, Wei Li, Hangjie Yuan, yu chengrong, Mang Wang, Zixiang Zhao, Aojun Lu, Pengliang Ji, Tao Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance on CIFAR-100, Image Net-100 and Tiny-Image Net. ... Table 1 empirically demonstrates the superiority of our method: Makes Continual Learning Stronger. ... As shown in Figure 2, models trained with vanilla-SGD exhibit higher maximal Hessian eigenvalues ... while our method induces a significant drop in Hessian eigenvalues ... leading to flatter minima. ... We perform ablation study in two cases: (i) the influence of λ and ρ on different CL methods; (ii) the influence of ρ and its scheduler on different optimizers.
Researcher Affiliation Collaboration Ang Bian 1, Wei Li 1,2, Hangjie Yuan 3,4, Chengrong Yu1, Mang Wang5, Zixiang Zhao6, Aojun Lu1, Pengliang Ji7, Tao Feng 2 1Sichuan University 2Tsinghua University 3DAMO Academy, Alibaba Group 4Zhejiang University 5Byte Dance 6Xi an Jiaotong University 7Carnegie Mellon University
Pseudocode Yes Algorithm 1 C-Flat Optimization ... Algorithm 2 C-Flat for GPM-family at T > 1
Open Source Code Yes Code is available at https://github.com/WanNaa/C-Flat.
Open Datasets Yes Datasets. We evaluate the performance on CIFAR-100, Image Net-100 and Tiny-Image Net. Adherence to [66, 67], the random seed for class-order shuffling is fixed at 1993.
Dataset Splits Yes Subsequently, we follow two typical class splits in CIL: (i) Divide all Yb classes equally into B phases, denoted as B0_Incy; (ii) Treat half of the total classes as initial phases, followed by an equally division of the remaining classes into incremental phases, denoted as B50_Incy. In both settings, y denotes that learns y new classes per task.
Hardware Specification No The paper does not specify the types of GPUs, CPUs, or other specific hardware used for running the experiments. It only generally refers to 'compute resources' in the NeurIPS checklist.
Software Dependencies No The paper mentions 'a vanilla-SGD optimizer [71]' and 'Py Hessian [59]' but does not provide specific version numbers for these or other key software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For a given dataset, we study all methods using the same network architecture following repo [66, 67], i.e. Res Net-32 for CIFAR and Res Net-18 for Image Net. If not specified otherwise, the hyper-parameters for all models adhere to the settings in the open-source library [66, 67]. Each task are initialized with the same ρ and η, which drops with iterations according to the scheduler from [70]. To ensure a fair comparison, all models are trained with a vanilla-SGD optimizer [71]. And the proposed method is plugged into the SGD. ... λ controls the strength of the C-Flat penalty ... ρ controls the step length of gradient ascent.