Meta-Consolidation for Continual Learning
Authors: Joseph K J, Vineeth N Balasubramanian
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with continual learning benchmarks of MNIST, CIFAR-10, CIFAR-100 and Mini Image Net datasets show consistent improvement over five baselines, including a recent state-of-the-art, corroborating the promise of MERLIN. |
| Researcher Affiliation | Academia | K J Joseph and Vineeth N Balasubramanian Department of Computer Science and Engineering Indian Institute of Technology Hyderabad, India {cs17m18p100001,vineethnb}@iith.ac.in |
| Pseudocode | Yes | Algorithm 1 MERLIN: Overall Methodology; Algorithm 2 META-CONSOLIDATION IN MERLIN; Algorithm 3 MERLIN INFERENCE |
| Open Source Code | Yes | Our code1 is implemented in Py Torch [60] and runs on a single NVIDIA V-100 GPU. (Footnote: 1https://github.com/Joseph KJ/merlin) |
| Open Datasets | Yes | Five standard continual learning benchmarks, viz. Split MNIST [13], Permuted MNIST [88], Split CIFAR-10 [88], Split CIFAR-100 [63] and Split Mini-Imagenet [15], are used in the experiments, following recent continual learning literature [12, 4, 63, 51, 13]. |
| Dataset Splits | No | While Section 3 generally mentions 'training, validation and test samples' for a task, the specific experimental setup in Section 4.1.1 (Datasets) only details the use of training and test sets for the listed benchmarks (e.g., '1000 images per task for training and the model is evaluated on the all test examples'). No specific validation splits or how they were used for reproduction are provided. |
| Hardware Specification | Yes | Our code1 is implemented in Py Torch [60] and runs on a single NVIDIA V-100 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch [60]' but does not provide a specific version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | For the MNIST dataset, we use a two-layer fully connected neural network with 100 neurons each, with Re LU activation... batch size is set to 10 and Adam [35] is used as the optimizer, with an initial learning rate of 0.001 and weight decay of 0.001. ... trained only for a single epoch... We use a chunk size of 300 for all experiments... Ada Grad [20] is used as the optimizer with an initial learning rate of 0.001. Batch size is set to 1 and the VAE network is trained for 25 epochs. At test time, we sample 30 models from the trained decoder. |