Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Predicting the Susceptibility of Examples to Catastrophic Forgetting

Authors: Guy Hacohen, Tinne Tuytelaars

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our key observation is a last-in-first-out forgetting pattern: examples learned later are more prone to forgetting, while earlier-learned ones are preserved. This aligns with the simplicity bias of neural networks (Shah et al., 2020; Szegedy et al., 2014), where simpler examples are learned first. As a result, simple examples are consistently remembered, while more complex ones are forgotten as new data arrives. This pattern holds across a wide range of architectures, datasets, and training configurations including variations in learning rates, optimizers, schedulers, epochs, and regularization strategies (see 2, App. D). Fig. 1 visualizes remembered vs. forgotten examples in CIFAR-100.
Researcher Affiliation	Academia	1ESAT-PSI, KU Leuven, Belgium. Correspondence to: Guy Hacohen <EMAIL>, Tinne Tuytelaars <EMAIL>.
Pseudocode	Yes	Algorithm 1 Training CL method with SBS. Input: Dt, \|B\|, E, amount of quick/slow to remove q, s. Output: buffer of size \|B\|
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. Phrases like 'We release our code...' or a direct repository link are missing.
Open Datasets	Yes	Datasets. We investigated various image continual learning classification tasks using split versions of several image datasets, including CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Tiny Image Net (Le & Yang, 2015).
Dataset Splits	Yes	The data is split into T tasks by partitioning the classes into T equal-sized subsets. This partitioning is denoted as dataset T. For example, splitting CIFAR-10 into 5 classes is denoted as CIFAR-10-5, comprising 5 tasks, each with 2 distinct classes.
Hardware Specification	Yes	All networks were trained on NVIDIA TITAN X.
Software Dependencies	No	The paper mentions software components and frameworks like "Res Net-18", "SGD optimizer", "cosine scheduler", "experience replay strategy", and "framework of (Boschini et al., 2022; Buzzega et al., 2020)". However, it does not provide specific version numbers for these or other ancillary software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	In our experiments, unless stated otherwise, we trained Res Net-18 for E = 100 epochs per task. We employed a base learning rate of 0.1 with a cosine scheduler, SGD optimizer, momentum of 0.9, and weight decay of 0.0005.