Adaptive Shortcut Debiasing for Online Continual Learning
Authors: Doyoung Kim, Dongmin Park, Yooju Shin, Jihwan Bang, Hwanjun Song, Jae-Gil Lee
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on five benchmark datasets demonstrate that, when combined with various OCL algorithms, Drop Top increases the average accuracy by up to 10.4% and decreases the forgetting by up to 63.2%. |
| Researcher Affiliation | Academia | Doyoung Kim, Dongmin Park, Yooju Shin, Jihwan Bang, Hwanjun Song, Jae-Gil Lee* KAIST, Daejeon, Republic of Korea {dodokim, dongminpark, yooju.shin, jihwan.bang, songhwanjun, jaegil}@kaist.ac.kr |
| Pseudocode | Yes | Appendix A describes the pseudocode of adaptive intensity shifting, which is self-explanatory. |
| Open Source Code | Yes | All algorithms are implemented using Py Torch 1.12.1 and tested on a single NVIDIA RTX 2080Ti GPU, and the source code is available at https://github.com/kaist-dmlab/Drop Top. |
| Open Datasets | Yes | We use the Split CIFAR-10 (Krizhevsky, Hinton et al. 2009), Split CIFAR-100 (Krizhevsky, Hinton et al. 2009), and Split Image Net-9 (Xiao et al. 2020) for the biased setup. ... In Image Net-Only FG (Xiao et al. 2020), the background is removed to evaluate the dependecy on the background in image recognition; in Image Net-Stylized (Geirhos et al. 2019), the local texture is shifted by style-transfer, and the reliance of a model on the local texture cue is removed. |
| Dataset Splits | No | The paper describes an online continual learning setting where data streams emerge continually. While it uses an 'episodic memory' for internal loss calculation and 'validates' performance, it does not specify traditional fixed training/validation/test dataset splits with percentages or absolute counts for the entire dataset. |
| Hardware Specification | Yes | All algorithms are implemented using Py Torch 1.12.1 and tested on a single NVIDIA RTX 2080Ti GPU |
| Software Dependencies | Yes | All algorithms are implemented using Py Torch 1.12.1 and tested on a single NVIDIA RTX 2080Ti GPU, and the source code is available at https://github.com/kaist-dmlab/Drop Top. |
| Experiment Setup | Yes | For all algorithms and datasets, the size of a minibatch from the data stream and the replay memory is set to 32, following (Buzzega et al. 2020). The size of episodic memory is set to 500 for Split CIFAR-10 and Split Image Net-9 and 2,000 for Split CIFAR-100 depending on the total number of classes. ... We train Res Net18 using SGD with a learning rate of 0.1 (Buzzega et al. 2020; Shim et al. 2021) for all Res Netbased algorithms. We optimize L2P and Dual Prompt with a pretrained Vi T-B/16 using Adam with a learning rate of 0.05, β1 of 0.9, and β2 of 0.999. ... For attentive debiasing, we fix the total drop ratio γ to 5.0% and set the initial drop intensity κ0 to 5.0% for ER, DER++, and MIR and to 0.5% for GSS and ASER, differently depending on the sampling method. For L2P and Dual Prompt, we set γ and κ0 to 2.0% and 1.0%, respectively, owing to the difference of the backbone network. For adaptive intensity shifting, we fix the history length l to 10... The alternating period p = 3 and the shifting step size α = 0.9 are adequate across the algorithms and datasets. |