Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning

Authors: Kai Jiang, Zhengyan Shi, Dell Zhang, Hongyuan Zhang, Xuelong Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on six benchmark datasets demonstrate that MIN achieves state-ofthe-art performance in most incremental settings, with particularly outstanding results in 50-steps incremental settings. In this section, we evaluate MIN on several benchmark datasets and compare it with other SOTA methods to demonstrate its superiority. In addition, we provide an ablation study and a visualized analysis to validate the effectiveness of MIN.
Researcher Affiliation	Collaboration	1School of Artificial Intelligence, OPtics and Electro Nics, Northwestern Polytechnical University 2Institute of Artificial Intelligence (Tele AI) of China Telecom
Pseudocode	Yes	Algorithm 1: Training pipeline for MIN Input: Incremental Datasets: {D1, , DT }, Pre-trained backbone: F = {f1, , f L} Output: Incrementally trained model
Open Source Code	Yes	Code is available at https://github.com/ ASCIIJK/Mi N-Neur IPS2025.
Open Datasets	Yes	We conduct experiments on six benchmark datasets, including CIFAR100 [47], CUB200 [48], Image Net-A [49], Image Net-R [50], FOOD101 [51] and Omnibenchmark [52].
Dataset Splits	Yes	Following recent CIL works, we split a dataset into T tasks, denoted as T steps. Each task learns the same number of categories. To further investigate the performance of all methods across various step sizes, we set the range of T from 5 to 50, i.e., T = 5, 10, 20, and 50. Table 6: Details of six datasets. Datasets Classes Train Test Avg size CIFAR-100 100 50,000 10,000 32 32 CUB-200 200 9,430 2,358 467 386 Image Net-A 200 5,960 1,515 443 427 Image Net-R 200 24,000 6,000 443 427 Omnibenchmark 300 89,697 5,985 764 581 FOOD-101 101 75,750 25,250 496 475
Hardware Specification	Yes	CPU: Intel Xeon(R) Gold 6244 CPU GPU: 2 NVIDIA Ge Force RTX 4090 Mem: 8 DDR4 SAMSUNG-32GB
Software Dependencies	No	We run all the experiments with Py Torch [56] and reproduce all other comparison methods with Pilot [57].
Experiment Setup	Yes	In MIN, we set the batch size to 128 and train for 10 epochs using SGD optimizer with momentum. The learning rate is initially set to 0.001 and decays to 0 following a cosine annealing decay pattern. The dimension d2 of the latent vector within the Pi-Noise layer is set to 192, with more configurations of this hyperparameter detailed in the supplementary materials.