Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

REP: Resource-Efficient Prompting for Rehearsal-Free Continual Learning

Authors: Sungho Jeon, Xinyue Ma, Kwang In Kim, Myeongjae Jeon

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple image classification datasets demonstrate REP s superior resource efficiency over state-of-the-art rehearsal-free CL methods.
Researcher Affiliation	Academia	Sungho Jeon, Xinyue Ma, Kwang In Kim, Myeongjae Jeon POSTECH EMAIL
Pseudocode	Yes	A Algorithm details Algorithm 1 Adaptive Token Merging (ATo M) Input: Initial set of all tokens T; Set of prompt tokens P; Number of model layers L; Maximum number of tokens to merge rmax Initialize: T final T; 1: for l {1, 2, . . . , L} do 2: Tattn MSA(T final) 3: Teligible Tattn \ P 4: δ rmax L 1 5: n min(δ (l 1), rmax) 6: T merged Merge(Teligible, n ) 7: Tconcat Concat(T merged, P) 8: T final MLP(Tconcat, l) 9: end for 10: return T final Algorithm 2 Adaptive Layer Dropping (ALD) Input: Input tensor X; Keep ratio of layer θt,l; Number of layers L; Minimum ratio θ; Decay rate γ; Spatial threshold τ; Adjustment factor α 1: for l {1, 2, . . . , L} do 2: if (n(l) n (l)) > τ then 3: α(l) α 4: else 5: α(l) 1 6: end if 7: θt,l α(l) (1 θ) exp( γ t) + θ 8: if Bernoulli(θt,l) = 1 then 9: Xout Exec(l, Xout) 10: else 11: Xout Xout 12: end if 13: end for 14: return Xout
Open Source Code	Yes	Our code is zipped and attached as a supplementary file for reproducibility.
Open Datasets	Yes	To organize task streams, we use three image classification datasets: CIFAR100 (100 classes) [18], Image Net-R (200 classes) [12], and Plant Disease (38 classes) [25].
Dataset Splits	Yes	We divide CIFAR-100 and Image Net-R into 10 tasks to create Split CIFAR-100 (i.e., 10 classes per task) and Split Image Net-R (i.e., 20 classes per task), respectively [38, 37, 31]. For Plant Disease, we drop 3 plant disease classes with very few images and organize the remaining 35 classes into 7 tasks to create Split Plant Disease (i.e., 5 classes per task).
Hardware Specification	Yes	We use an NVIDIA RTX 3090 to cover a wide range of memory capacities while maintaining consistent computational power.
Software Dependencies	No	For all prompt-based CL methods, we employ the ADAM optimizer [15] with hyperparameters β1 = 0.990 and β2 = 0.999, following their original implementation.
Experiment Setup	Yes	For all prompt-based CL methods, we employ the ADAM optimizer [15] with hyperparameters β1 = 0.990 and β2 = 0.999, following their original implementation. For prompt selection, we set the prompt pool size to 10 and the prompt length to 5, following the original implementations of L2P, Dual Prompt, and Hi De-Prompt. For CODA-Prompt, we implement a cosine-decay learning rate strategy described in [31], while other prompt-based baselines use a fixed learning rate of 1.875 10 3 by default. We consistently use the mini-batch size of 16 for all methods to maintain a uniform computational load for each training iteration. For fair comparisons, each method performs 1875, 1080, and 2583 iterations per task for Split CIFAR-100 [18], Split Image Net-R [12], and Split Plant Disease [25], respectively.