Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting
Authors: Alex Ororbia, Ankur Mali, C Lee Giles, Daniel Kifer
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we demonstrate that our self-organizing system experiences significantly less forgetting compared to standard neural models, outperforming a swath of previously proposed methods, including rehearsal/data buffer-based methods, on both standard (Split MNIST, Split Fashion MNIST, etc.) and custom benchmarks even though it is trained in a stream-like fashion. |
| Researcher Affiliation | Academia | Alexander G. Ororbia Rochester Institute of Technology Rochester, NY 14623, USA ago@cs.rit.edu Ankur Mali University of South Florida Tampa, FL 33620, USA ankurarjunmali@usf.edu C. Lee Giles The Pennsylvania State University State College, PA 16801, USA clg20@psu.edu Daniel Kifer The Pennsylvania State University State College, PA 16801, USA duk17@psu.edu |
| Pseudocode | Yes | The pseudocode illustrating how the elements described so far are combined in an S-NCN system is presented in Algorithms 1 and 2. |
| Open Source Code | No | The code and the data are proprietary. We will provide complete codebase to support this paper upon acceptance. |
| Open Datasets | Yes | We create task sequences by breaking apart MNIST (M), Fashion MNIST (FM), and Google Draw (GD) each into two sub-tasks (e.g., for MNIST, M1 and M2)...we experimented with a wide swath of approaches on three benchmarks Split MNIST, Split Not MNIST, and Split Fashion MNIST (FMNIST). |
| Dataset Splits | Yes | Modern-day connectionist systems are typically trained on a fixed pool of data samples, collected in controlled environments, in isolation and random order, and then evaluated on a separate validation data pool...For each baseline, we tuned hyper-parameters based on their accuracy on each task s development set. |
| Hardware Specification | No | The main text mentions "computing infrastructure" in the appendix, but does not specify any particular hardware models (e.g., GPU or CPU types) used for the experiments within the main body of the paper. |
| Software Dependencies | No | The paper mentions software components like "multilayer perceptrons (MLPs)" and "stochastic gradient descent" but does not specify version numbers for any libraries, frameworks, or operating systems used (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | In our experiments, we train models with three hidden layers, whether they be multilayer perceptrons (MLPs) or S-NCNs and compare against baselines from the literature. All models were restricted to contain (a maximum of) 500 units per layer. For the S-NCN, weights were initialized from a Gaussian distribution scaled by each layer s fan-in and were optimized using stochastic gradient descent with learning rate of λ = 0.01. Baseline models were trained on each task for 40 epochs...hyper-parameters were β = 0.05, K = 10, ηg = 0.9, ηe = 0.01, α = 0.98). |