reproducibilityindex.ai

Efficient Lifelong Learning with A-GEM

Authors: Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.
Researcher Affiliation	Collaboration	1University of Oxford, 2Facebook AI Research
Pseudocode	Yes	Algorithm 1 Learning and Evaluation Protocols
Open Source Code	Yes	1The code is available at https://github.com/facebookresearch/agem.
Open Datasets	Yes	Permuted MNIST (Kirkpatrick et al., 2016) is a variant of MNIST (Le Cun, 1998) dataset of handwritten digits where each task has a certain random permutation of the input pixels which is applied to all the images of that task. Split CIFAR (Zenke et al., 2017) consists of splitting the original CIFAR-100 dataset (Krizhevsky & Hinton, 2009) into 20 disjoint subsets...
Dataset Splits	Yes	As described in Sec. 2 and outlined in Alg. 1, in order to cross validate we use the ﬁrst 3 tasks, and then report metrics on the remaining 17 tasks after doing a single training pass over each task in sequence.
Hardware Specification	No	The paper mentions "The timing refers to training time on a GPU device" in Table 7, but does not specify the model or any other hardware components like CPU or memory.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow, CUDA versions).
Experiment Setup	Yes	In terms of architectures, we use a fully-connected network with two hidden layers of 256 ReLU units each for Permuted MNIST, a reduced ResNet18 for Split CIFAR like in Lopez-Paz & Ranzato (2017), and a standard ResNet18 (He et al., 2016) for Split CUB and Split AWA. For a given dataset stream, all models use the same architecture, and all models are optimized via stochastic gradient descent with mini-batch size equal to 10. [...] Below we report the hyper-parameters grid considered for different experiments. ... The best setting for each experiment is reported in the parenthesis.