Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

Authors: Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph E. Gonzalez, Marcus Rohrbach, trevor darrell

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have evaluated our approach in the standard and few-shot settings and observed a consistent improvement across various CL approaches using different architectures and techniques to generate model explanations and demonstrated our approach showing a promising connection between explainability and continual learning. We empirically show the effect of RRR in standard and few-shot class incremental learning (CIL) scenarios on popular benchmark datasets including CIFAR100, Image Net100, and Caltech-UCSD Birds 200 using different network architectures where RRR improves overall accuracy and forgetting over experience replay and other memory-based method.
Researcher Affiliation Collaboration Sayna Ebrahimi1, Suzanne Petryk1, Akash Gokul1, William Gan1, Joseph E. Gonzalez1, Marcus Rohrbach2, Trevor Darrell1 1UC Berkeley, 2 Facebook AI Research {sayna,spetryk,akashgokul,wjgan,jegonzal,trevordarrell}@berkeley.edu mrf@fb.com
Pseudocode Yes Algorithm 1 Remembering for the Right Reasons (RRR) for Continual Learning
Open Source Code Yes Our code is available at https://github.com/ Sayna Ebrahimi/Remembering-for-the-Right-Reasons.
Open Datasets Yes We empirically show the effect of RRR in standard and few-shot class incremental learning (CIL) scenarios on popular benchmark datasets including CIFAR100, Image Net100, and Caltech-UCSD Birds 200 using different network architectures where RRR improves overall accuracy and forgetting over experience replay and other memory-based method. (Wah et al., 2011) (Krizhevsky & Hinton, 2009)
Dataset Splits Yes The remaining 100 classes are divided into 10 tasks where 5 samples per class are randomly selected as the training set, while the test set is kept intact containing near 300 images per task.
Hardware Specification No The paper mentions network architectures (Res Net18, Alex Net, Squeeze Net), optimizers (RAdam), and software frameworks (PyTorch), but does not specify any particular hardware like GPU models, CPU types, or other computer specifications used for running experiments.
Software Dependencies No The paper mentions 'PyTorch (Paszke et al., 2017)' but does not specify a precise version number for PyTorch or any other software dependencies, libraries, or solvers used in the experiments.
Experiment Setup Yes We used the RAdam (Liu et al., 2019) optimizer with a learning rate of 0.001 which was reduced by 0.2 at epochs 20, 40, and 60 and trained for a total of 70 epochs with a batch size of 128 for the first and 10 for the remaining tasks.