Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Autoencoder-Based Hybrid Replay for Class-Incremental Learning

Authors: Milad Khademi Nori, Il Min Kim, Guanghui Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide comprehensive experiments to demonstrate the strong performance of AHR: we conduct our experiments across five benchmarks and ten baselines to showcase the effectiveness of AHR utilizing HAE and RFA while operating with the same memory and compute budgets. ... Table 2: Empirical evaluation of AHR against a suite of CIL baselines. Accuracies and the SEMs.
Researcher Affiliation	Academia	1Department of Computer Science, Toronto Metropolitan University, Toronto, Ontario, Canada 2Electrical and Computer Engineering, Queen s University, Kingston, Ontario, Canada. Correspondence to: Milad Khademi Nori <EMAIL>.
Pseudocode	Yes	Algorithm 1 Autoencoder-Based Hybrid Replay Algorithm 2 CCE Placement Algorithm 3 HAE Training Algorithm 4 Memory Population
Open Source Code	Yes	Implementation is available at github.com/miladkhademinori/autoencoderhybrid-replay-cil.
Open Datasets	Yes	We have MNIST(5/2) (Le Cun et al., 2010), Balanced SVHN(5/2) (Netzer et al., 2011), CIFAR-10(5/2) (Krizhevsky et al., 2009), CIFAR-100(10/10) (Krizhevsky et al., 2009), and mini Image Net(20/5) (Vinyals et al., 2016) benchmarks.
Dataset Splits	Yes	The series of tasks for CIL are constructed according to (Masana et al., 2020; Ven et al., 2021; Zaj ac et al., 2023), where the popular image classification datasets are split up such that each task presents data pertaining to a subset of classes, in a non-overlapping manner. For naming benchmarks, we follow (Masana et al., 2020), where dataset D is divided into T tasks with C classes for each task. Hence, a benchmark is named as D(T/C).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are provided in the paper.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014) as the optimizer' and 'Res Net-32' for network architecture, but does not provide specific version numbers for software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	No	The main text states: 'The learning rates, batch sizes, and strategy-dependent hyperparameters are detailed in Appendix B in the supplementary material.' However, specific concrete values for these hyperparameters are not provided in the main text of the paper.