reproducibilityindex.ai

Meta-Consolidation for Continual Learning

Authors: Joseph K J, Vineeth N Balasubramanian

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments with continual learning benchmarks of MNIST, CIFAR-10, CIFAR-100 and Mini Image Net datasets show consistent improvement over ﬁve baselines, including a recent state-of-the-art, corroborating the promise of MERLIN.
Researcher Affiliation	Academia	K J Joseph and Vineeth N Balasubramanian Department of Computer Science and Engineering Indian Institute of Technology Hyderabad, India {cs17m18p100001,vineethnb}@iith.ac.in
Pseudocode	Yes	Algorithm 1 MERLIN: Overall Methodology; Algorithm 2 META-CONSOLIDATION IN MERLIN; Algorithm 3 MERLIN INFERENCE
Open Source Code	Yes	Our code1 is implemented in Py Torch [60] and runs on a single NVIDIA V-100 GPU. (Footnote: 1https://github.com/Joseph KJ/merlin)
Open Datasets	Yes	Five standard continual learning benchmarks, viz. Split MNIST [13], Permuted MNIST [88], Split CIFAR-10 [88], Split CIFAR-100 [63] and Split Mini-Imagenet [15], are used in the experiments, following recent continual learning literature [12, 4, 63, 51, 13].
Dataset Splits	No	While Section 3 generally mentions 'training, validation and test samples' for a task, the specific experimental setup in Section 4.1.1 (Datasets) only details the use of training and test sets for the listed benchmarks (e.g., '1000 images per task for training and the model is evaluated on the all test examples'). No specific validation splits or how they were used for reproduction are provided.
Hardware Specification	Yes	Our code1 is implemented in Py Torch [60] and runs on a single NVIDIA V-100 GPU.
Software Dependencies	No	The paper mentions 'Py Torch [60]' but does not provide a specific version number for PyTorch or any other software dependency.
Experiment Setup	Yes	For the MNIST dataset, we use a two-layer fully connected neural network with 100 neurons each, with Re LU activation... batch size is set to 10 and Adam [35] is used as the optimizer, with an initial learning rate of 0.001 and weight decay of 0.001. ... trained only for a single epoch... We use a chunk size of 300 for all experiments... Ada Grad [20] is used as the optimizer with an initial learning rate of 0.001. Batch size is set to 1 and the VAE network is trained for 25 epochs. At test time, we sample 30 models from the trained decoder.