Adaptive Shortcut Debiasing for Online Continual Learning

Authors: Doyoung Kim, Dongmin Park, Yooju Shin, Jihwan Bang, Hwanjun Song, Jae-Gil Lee

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five benchmark datasets demonstrate that, when combined with various OCL algorithms, Drop Top increases the average accuracy by up to 10.4% and decreases the forgetting by up to 63.2%.
Researcher Affiliation Academia Doyoung Kim, Dongmin Park, Yooju Shin, Jihwan Bang, Hwanjun Song, Jae-Gil Lee* KAIST, Daejeon, Republic of Korea {dodokim, dongminpark, yooju.shin, jihwan.bang, songhwanjun, jaegil}@kaist.ac.kr
Pseudocode Yes Appendix A describes the pseudocode of adaptive intensity shifting, which is self-explanatory.
Open Source Code Yes All algorithms are implemented using Py Torch 1.12.1 and tested on a single NVIDIA RTX 2080Ti GPU, and the source code is available at https://github.com/kaist-dmlab/Drop Top.
Open Datasets Yes We use the Split CIFAR-10 (Krizhevsky, Hinton et al. 2009), Split CIFAR-100 (Krizhevsky, Hinton et al. 2009), and Split Image Net-9 (Xiao et al. 2020) for the biased setup. ... In Image Net-Only FG (Xiao et al. 2020), the background is removed to evaluate the dependecy on the background in image recognition; in Image Net-Stylized (Geirhos et al. 2019), the local texture is shifted by style-transfer, and the reliance of a model on the local texture cue is removed.
Dataset Splits No The paper describes an online continual learning setting where data streams emerge continually. While it uses an 'episodic memory' for internal loss calculation and 'validates' performance, it does not specify traditional fixed training/validation/test dataset splits with percentages or absolute counts for the entire dataset.
Hardware Specification Yes All algorithms are implemented using Py Torch 1.12.1 and tested on a single NVIDIA RTX 2080Ti GPU
Software Dependencies Yes All algorithms are implemented using Py Torch 1.12.1 and tested on a single NVIDIA RTX 2080Ti GPU, and the source code is available at https://github.com/kaist-dmlab/Drop Top.
Experiment Setup Yes For all algorithms and datasets, the size of a minibatch from the data stream and the replay memory is set to 32, following (Buzzega et al. 2020). The size of episodic memory is set to 500 for Split CIFAR-10 and Split Image Net-9 and 2,000 for Split CIFAR-100 depending on the total number of classes. ... We train Res Net18 using SGD with a learning rate of 0.1 (Buzzega et al. 2020; Shim et al. 2021) for all Res Netbased algorithms. We optimize L2P and Dual Prompt with a pretrained Vi T-B/16 using Adam with a learning rate of 0.05, β1 of 0.9, and β2 of 0.999. ... For attentive debiasing, we fix the total drop ratio γ to 5.0% and set the initial drop intensity κ0 to 5.0% for ER, DER++, and MIR and to 0.5% for GSS and ASER, differently depending on the sampling method. For L2P and Dual Prompt, we set γ and κ0 to 2.0% and 1.0%, respectively, owing to the difference of the backbone network. For adaptive intensity shifting, we fix the history length l to 10... The alternating period p = 3 and the shifting step size α = 0.9 are adequate across the algorithms and datasets.