Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Compact Memory for Continual Logistic Regression

Authors: Yohan Jung, Hyungi Lee, Wenlong Chen, Thomas Möllenhoff, Yingzhen Li, Juho Lee, Mohammad Emtiyaz Khan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the following results: (i) multi-output linear regression for sanity check, (ii) two binary logistic regression on toy data and USPS dataset, and (iii) multi-class logistic regression on four datasets based on CIFAR-10, CIFAR-100, Tiny Image Net-200, and Image Net-100. The multi-output regression result is included in App. B.1 where we compare to the vanilla SVD and show that our method is numerically more robust. The rest of the results are described in the next section.
Researcher Affiliation Academia 1RIKEN Center for AI Project 2Kookmin University 3Imperial College London 4KAIST Work performed in part during internship at the RIKEN Center for AI Project.
Pseudocode Yes Algorithm 1: Hessian Matching by PPCA Require: Φt 1, Ut, wt and ϵ 1: W Ð Diagpwtq and U Ð Ut W1{2 2: 3: T Ð pΦt 1, Ut W1{2q 4: S Ð TTJ 5: while not converged do 6: 7: M Ð UJ U ϵI 8: U Ð S UpϵI M 1 UJS Uq 1 9: 10: end while 11: w Ð diagp UJ Uq and W Ð Diagpwq 12: U Ð UW 1{2 13: return Ut 1 Ð U and wt 1 Ð w Algorithm 2: Extension for Logistic Regression Require: Φt 1, Ut, wt, ϵ, and θt 1 1: W Ð Diagpwtq and U Ð Ut 2: λ1 Ð ˆy1pΦJ t 1θt 1q and λ2 Ð ˆy1p UJ t θt 1q 3: T Ð Φt 1Diagpλ1q1{2, Ut W1{2λ1{2 2 5: while not converged do 6: λ Ð ˆy1p UJθt 1q and U Ð Up Wλq1{2 7: M Ð UJ U ϵI 8: U Ð S UpϵI M 1 UJS Uq 1 9: U Ð Uλ 1{2 10: w Ð diagp UJ Uq and W Ð Diagpwq 11: U Ð UW 1{2 12: end while 13: return Ut 1 Ð U and wt 1 Ð w
Open Source Code Yes Our code is available at https://github.com/team-approx-bayes/compact_memory_code.
Open Datasets Yes For instance, on Split-Image Net, we get 60% accuracy compared to 30% obtained by replay with memory-size equivalent to 0.3% of the data size. We start with the toy example on the four-moon dataset and then show results for the USPS dataset. 1. Split-CIFAR-10 is a sequence of 5 tasks constructed by dividing 50,000 examples of CIFAR-10 into non-overlapping subsets. 2. Split-CIFAR-100 is a sequence of 20 tasks constructed by dividing 50,000 examples of CIFAR100 into non-overlapping subsets. 3. Split-Tiny Image Net-200 is a sequence of 20 tasks constructed by dividing 100,000 examples of Tiny Image Net-200 into non-overlapping subsets. 4. Split-Image Net-1000 is a sequence of 10 tasks constructed by dividing 1,281,167 examples of Image Net-1000 into non-overlapping subsets. Multi-output linear regression We show that our EM algorithm for Hessian matching achieves compact memory. We conduct a continual multi-label regression task on Split-MNIST [Zenke et al., 2017], where the MNIST dataset is divided into five subsets, each corresponding to a binary classification task with label pairs (0,1), (2,3), (4,5), (6,7), and (8,9). Following the benchmark experiment setting in [Carta et al., 2023], we also consider Permuted-MNIST consisting of a sequence of 5 tasks constructed by permuting pixels of 60,000 training samples of MNIST for each task.
Dataset Splits Yes Four-moon dataset. We split the four-moon dataset into four binary tasks and perform continual logistic regression over them. The training set of each task consists of 500 data points and we test over all inputs obtained on a 2-D grid p 3.2, 3.2q ˆ p 1.2 ˆ 1.2q in the input space (a total of 158,632). USPS odd vs even datasets. The USPS dataset is a 16ˆ16 image dataset of digits from 0 to 9. We relabel each digit based on whether it is even or odd and consider a continual logistic regression with 5 task sequences given as p0, 1qÑp2, 3qÑp4, 5qÑp6, 7qÑp8, 9q. The training set of each task consists of 1000 data points, and the test set of each task consists of 300 data points. 1. Split-CIFAR-10 is a sequence of 5 tasks constructed by dividing 50,000 examples of CIFAR-10 into non-overlapping subsets. Each task contains 2consecutive classes. 2. Split-CIFAR-100 is a sequence of 20 tasks constructed by dividing 50,000 examples of CIFAR100 into non-overlapping subsets. Each task contains 5 consecutive classes. 3. Split-Tiny Image Net-200 is a sequence of 20 tasks constructed by dividing 100,000 examples of Tiny Image Net-200 into non-overlapping subsets. Each task contains 10 consecutive classes. 4. Split-Image Net-1000 is a sequence of 10 tasks constructed by dividing 1,281,167 examples of Image Net-1000 into non-overlapping subsets. Each task contains 100 consecutive classes. Multi-output linear regression We conduct a continual multi-label regression task on Split-MNIST [Zenke et al., 2017], where the MNIST dataset is divided into five subsets, each corresponding to a binary classification task with label pairs (0,1), (2,3), (4,5), (6,7), and (8,9). Permuted-MNIST consisting of a sequence of 5 tasks constructed by permuting pixels of 60,000 training samples of MNIST for each task.
Hardware Specification Yes For all experiments, we use NVIDIA RTX 6000 Ada for experiments on Permuted-MNIST and Split-Tiny Image Net. For other experiments, we use NVIDIA RTX-3090.
Software Dependencies No We use the scikit-learn package.
Experiment Setup Yes For hyperparameters of learning model parameter θt, we use Adam optimizer with learning lr 10 3. We use 100 epochs for each task. For the weight-space regularization hyperparameter δ, we use δ 0.01. For hyperparameters of learning memory Ut 1, we run 10, 000 iterations for each task. For the noise hyperparameter ϵ for the EM algorithm, we use ϵ 10 3.