Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Are Greedy Task Orderings Better Than Random in Continual Linear Regression?

Authors: Matan Tsipory, Ran Levinstein, Itay Evron, Mark Kong, Deanna Needell, Daniel Soudry

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that greedy orderings converge faster than random ones in terms of the average loss across tasks, both for linear regression with random data and for linear probing on CIFAR-100 classification tasks. Analytically, in a high-rank regression setting, we prove a loss bound for greedy orderings analogous to that of random ones. However, under general rank, we establish a repetition-dependent separation. Specifically, while prior work showed that for random orderings, with or without replacement, the average loss after k iterations is bounded by O(1/ k) we prove that single-pass greedy orderings may fail catastrophically, whereas those allowing repetition converge at rate O(1/ 3 k). Overall, we reveal nuances within and between greedy and random orderings.
Researcher Affiliation Academia Equal contribution. Technion, Haifa. University of California, Los Angeles.
Pseudocode Yes Scheme 1 Continual linear regression (to convergence) Initialize w0 = 0d For each iteration t = 1, . . . , k: wt Start from wt 1 and minimize the current task s loss L΄(t)(w) X΄(t)w y΄(t) 2 with (S)GD to convergence Output wk
Open Source Code Yes We provide a code snippet for the regression experiments in App. H. The code for the classification experiments is accessible at https://github.com/matants/greedy_ordering.
Open Datasets Yes Empirically, we demonstrate that greedy orderings converge faster than random ones in terms of the average loss across tasks, both for linear regression with random data and for linear probing on CIFAR-100 classification tasks. Classification tasks: CIFAR-100. We randomly partition classes into continual binary classification tasks, similarly to Li and Hiratani [62]. We train a linear probe on top of a Res Net-20 embedder, pretrained on the original CIFAR-100 multiclass task [45, 55].
Dataset Splits Yes We partition the 500 training samples of each CIFAR-100 class to two distinct groups of 250 samples, and use one of the groups to train the Res Net-20 embedder on the original CIFAR-100 multiclass task, using the same training recipe as Chen [22] and achieving 61.57% top-1 classification accuracy on the CIFAR-100 test set after 200 training epochs. The partitioning and training code is included in our provided repository. We then employ a linear probe on top of the resulting model (with the classification head removed), and construct the continual learning tasks using the 250 samples per class that weren t used for training the embedder.
Hardware Specification Yes All regression experiments including those not shown were completed within 4 hours on a home PC equipped with an Intel i5-9400F CPU and 16GB of RAM. All classification experiments were completed within a month s work on 4 NVIDIA Ge Force GTX 1080 Ti GPUs.
Software Dependencies No The paper mentions 'Res Net-20 embedder' and 'Pytorch cifar models' implicitly through citations, and a code snippet in Appendix H uses 'numpy'. However, specific version numbers for these software dependencies are not explicitly listed in the provided text.
Experiment Setup Yes For each task we used the SGD optimizer with a learning rate of lr = 0.01 and Reduce LROn Plateau on epoch losses, trained for 40 epochs with a batch size of 64. As a baseline, we jointly trained a classifier on all tasks together, without regularization.