Overcoming Catastrophic Interference using Conceptor-Aided Backpropagation

Authors: Xu He, Herbert Jaeger

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results of two benchmark tests showed highly competitive performance of CAB. Section 4 compares its performance on the permuted and disjoint MNIST tasks to recent methods that address the same problem. To test the performance of CAB, we evaluated it on the permuted MNIST experiment...
Researcher Affiliation Academia Xu He, Herbert Jaeger Department of Computer Science and Electrical Engineering Jacobs University Bremen Bremen, 28759, Germany {x.he,h.jaeger}@jacobs-university.de
Pseudocode Yes Algorithm 1 The forward procedure of conceptor-aided backprop, adapted from the traditional backprop. Algorithm 2 The backward procedure of conceptor-aided backprop for the j-th task, adapted from the traditional backprop.
Open Source Code No The paper mentions an implementation by Seff (2017) for EWC (a baseline) and provides a GitHub link for that, but does not provide a link or statement for the source code of their proposed CAB method.
Open Datasets Yes To test the performance of CAB, we evaluated it on the permuted MNIST experiment... where a sequence of pattern recognition tasks are created from the MNIST dataset (Le Cun et al., 1998). Yann Le Cun, Corinna Cortes, and Christopher JC Burges. The MNIST database of handwritten digits. 1998. http://yann.lecun.com/exdb/mnist/.
Dataset Splits No The paper discusses training and testing data, but does not provide specific percentages or sample counts for training, validation, or test splits. It refers to 'permuted MNIST datasets' and 'disjoint MNIST datasets' but not the explicit ratios for data partitioning.
Hardware Specification No The paper states 'the time taken to compute a conceptor from the entire MNIST training set... is 0.42 seconds of standard notebook CPU time on average.' This is too general and does not provide specific hardware details (e.g., CPU model, memory, GPU).
Software Dependencies No The paper mentions using 'Vanilla SGD' and refers to an implementation of EWC, but does not provide specific version numbers for any software dependencies used for their own method (CAB).
Experiment Setup Yes Learning rate and aperture were set to 0.1 and 4, respectively. The parameters chosen for the EWC algorithm were 0.01 for the learning rate and 15 for the weight of the Fisher penalty term. The aperture α = 9 was used for all conceptors on all layers, learning rate η and regularization coefficient λ were chosen to be 0.1 and 0.005 respectively. Vanilla SGD was used in all experiments to optimize the cost function.