Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting

Authors: Alex Ororbia, Ankur Mali, C Lee Giles, Daniel Kifer

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we demonstrate that our self-organizing system experiences significantly less forgetting compared to standard neural models, outperforming a swath of previously proposed methods, including rehearsal/data buffer-based methods, on both standard (Split MNIST, Split Fashion MNIST, etc.) and custom benchmarks even though it is trained in a stream-like fashion.
Researcher Affiliation Academia Alexander G. Ororbia Rochester Institute of Technology Rochester, NY 14623, USA ago@cs.rit.edu Ankur Mali University of South Florida Tampa, FL 33620, USA ankurarjunmali@usf.edu C. Lee Giles The Pennsylvania State University State College, PA 16801, USA clg20@psu.edu Daniel Kifer The Pennsylvania State University State College, PA 16801, USA duk17@psu.edu
Pseudocode Yes The pseudocode illustrating how the elements described so far are combined in an S-NCN system is presented in Algorithms 1 and 2.
Open Source Code No The code and the data are proprietary. We will provide complete codebase to support this paper upon acceptance.
Open Datasets Yes We create task sequences by breaking apart MNIST (M), Fashion MNIST (FM), and Google Draw (GD) each into two sub-tasks (e.g., for MNIST, M1 and M2)...we experimented with a wide swath of approaches on three benchmarks Split MNIST, Split Not MNIST, and Split Fashion MNIST (FMNIST).
Dataset Splits Yes Modern-day connectionist systems are typically trained on a fixed pool of data samples, collected in controlled environments, in isolation and random order, and then evaluated on a separate validation data pool...For each baseline, we tuned hyper-parameters based on their accuracy on each task s development set.
Hardware Specification No The main text mentions "computing infrastructure" in the appendix, but does not specify any particular hardware models (e.g., GPU or CPU types) used for the experiments within the main body of the paper.
Software Dependencies No The paper mentions software components like "multilayer perceptrons (MLPs)" and "stochastic gradient descent" but does not specify version numbers for any libraries, frameworks, or operating systems used (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes In our experiments, we train models with three hidden layers, whether they be multilayer perceptrons (MLPs) or S-NCNs and compare against baselines from the literature. All models were restricted to contain (a maximum of) 500 units per layer. For the S-NCN, weights were initialized from a Gaussian distribution scaled by each layer s fan-in and were optimized using stochastic gradient descent with learning rate of λ = 0.01. Baseline models were trained on each task for 40 epochs...hyper-parameters were β = 0.05, K = 10, ηg = 0.9, ηe = 0.01, α = 0.98).