Efficient Lifelong Learning with A-GEM

Authors: Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.
Researcher Affiliation Collaboration 1University of Oxford, 2Facebook AI Research
Pseudocode Yes Algorithm 1 Learning and Evaluation Protocols
Open Source Code Yes 1The code is available at https://github.com/facebookresearch/agem.
Open Datasets Yes Permuted MNIST (Kirkpatrick et al., 2016) is a variant of MNIST (Le Cun, 1998) dataset of handwritten digits where each task has a certain random permutation of the input pixels which is applied to all the images of that task. Split CIFAR (Zenke et al., 2017) consists of splitting the original CIFAR-100 dataset (Krizhevsky & Hinton, 2009) into 20 disjoint subsets...
Dataset Splits Yes As described in Sec. 2 and outlined in Alg. 1, in order to cross validate we use the first 3 tasks, and then report metrics on the remaining 17 tasks after doing a single training pass over each task in sequence.
Hardware Specification No The paper mentions "The timing refers to training time on a GPU device" in Table 7, but does not specify the model or any other hardware components like CPU or memory.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow, CUDA versions).
Experiment Setup Yes In terms of architectures, we use a fully-connected network with two hidden layers of 256 ReLU units each for Permuted MNIST, a reduced ResNet18 for Split CIFAR like in Lopez-Paz & Ranzato (2017), and a standard ResNet18 (He et al., 2016) for Split CUB and Split AWA. For a given dataset stream, all models use the same architecture, and all models are optimized via stochastic gradient descent with mini-batch size equal to 10. [...] Below we report the hyper-parameters grid considered for different experiments. ... The best setting for each experiment is reported in the parenthesis.