Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Predicting the Susceptibility of Examples to Catastrophic Forgetting

Authors: Guy Hacohen, Tinne Tuytelaars

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our key observation is a last-in-first-out forgetting pattern: examples learned later are more prone to forgetting, while earlier-learned ones are preserved. This aligns with the simplicity bias of neural networks (Shah et al., 2020; Szegedy et al., 2014), where simpler examples are learned first. As a result, simple examples are consistently remembered, while more complex ones are forgotten as new data arrives. This pattern holds across a wide range of architectures, datasets, and training configurations including variations in learning rates, optimizers, schedulers, epochs, and regularization strategies (see 2, App. D). Fig. 1 visualizes remembered vs. forgotten examples in CIFAR-100.
Researcher Affiliation Academia 1ESAT-PSI, KU Leuven, Belgium. Correspondence to: Guy Hacohen <EMAIL>, Tinne Tuytelaars <EMAIL>.
Pseudocode Yes Algorithm 1 Training CL method with SBS. Input: Dt, |B|, E, amount of quick/slow to remove q, s. Output: buffer of size |B|
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. Phrases like 'We release our code...' or a direct repository link are missing.
Open Datasets Yes Datasets. We investigated various image continual learning classification tasks using split versions of several image datasets, including CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Tiny Image Net (Le & Yang, 2015).
Dataset Splits Yes The data is split into T tasks by partitioning the classes into T equal-sized subsets. This partitioning is denoted as dataset T. For example, splitting CIFAR-10 into 5 classes is denoted as CIFAR-10-5, comprising 5 tasks, each with 2 distinct classes.
Hardware Specification Yes All networks were trained on NVIDIA TITAN X.
Software Dependencies No The paper mentions software components and frameworks like "Res Net-18", "SGD optimizer", "cosine scheduler", "experience replay strategy", and "framework of (Boschini et al., 2022; Buzzega et al., 2020)". However, it does not provide specific version numbers for these or other ancillary software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes In our experiments, unless stated otherwise, we trained Res Net-18 for E = 100 epochs per task. We employed a base learning rate of 0.1 with a cosine scheduler, SGD optimizer, momentum of 0.9, and weight decay of 0.0005.