Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation

Authors: Jiahuan Zhou, Chao Zhu, Zhenyu Cui, Zichen Liu, Xu Zou, Gang Hua

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Image Net-C dataset verify the effectiveness of our proposed method against other methods. The experimental results show its effectiveness under various changes of testing domains. In particular, our KFF achieved 34.8% error under the Image Net-to-Image Net-C distributional shift case, which surpasses the previous SOTA method DPCore [61] by 5.1%.
Researcher Affiliation	Collaboration	Jiahuan Zhou1, Chao Zhu1, Zhenyu Cui1, Zichen Liu1, Xu Zou2 , Gang Hua3 1Wangxuan Institute of Computer Technology, Peking University, Beijing 100871, China 2the Huazhong University of Science and Technology, Wuhan 430074,China 3Amazon.com, Inc, Bellevue, WA 98004, USA
Pseudocode	Yes	Algorithm 1 Algorithm of Updating Class Prompt Pool Algorithm 2 Algorithm of Updating Domain Prompt Pool
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code will be made publicly available after the paper is accepted.
Open Datasets	Yes	The datasets used, CIFAR10-C, CIFAR-100-C and Image Net-C [15, 23], are publicly available online, released under Apache2.0 license.
Dataset Splits	Yes	Datasets. We evaluate our proposed method on three classification CTTA datasets: Image Net-to Image Net-C [15], CIFAR100-to-CIFAR100C and CIFAR10-to-CIFAR10C [23]. Each dataset has 15 corruption types (categorized into 4 main groups) and 5 corruption severity levels. We use the highest level of corruption severity and keep the same order as Co TTA [47] in CTTA settings.
Hardware Specification	Yes	The experiments were carried out in the repeating domain setting, with the results presented in Table 7. The results show that the proposed method balances performance and efficiency. When compared to Tent (superior raw efficiency), it gains 15.2% performance with minimal extra cost. Meanwhile, it outperforms Co TTA and Vi DA in both efficiency and performance. Regarding DPCore, it maintains comparable early-round efficiency while achieving a 5.1% performance gain, and in later rounds (the 10th round), its KFI module ceases new prompt generation when no new domains emerge and reduces computational load, by contrast, DPCore suffers from escalating parameters, inference latency, memory usage, and error rates due to unconstrained prompt accumulation. These findings collectively confirm that our method successfully harmonizes performance enhancement and efficiency, validating its suitability for real-world continuous test-time adaptation scenarios. Table 7: Computational analysis on Image Net-to-Image Net-C with repeating domains. Method Venue Params.(M) Time Mem.(GB) TFLOPs Err Mean Tent ICLR 21 0.03 1.0 5.5 1.08 51.0 Co TTA CVPR 22 86.57 4.7 16.2 3.24 49.9 Vi DA ICLR 24 93.70 35.3 9.3 14.03 43.4 DPCore ICML 25 0.08 1.6 5.7 1.67 39.9 Ours 0.09 1.9 6.0 1.68 34.8 Tent ICLR 21 0.03 1.0 5.5 1.08 99.9 Co TTA CVPR 22 86.57 4.7 16.2 3.24 53.5 Vi DA ICLR 24 93.70 35.3 9.3 14.03 42.3 DPCore ICML 25 1.03 2.1 8.4 1.73 46.8 Ours 0.20 1.8 6.1 1.60 34.5 Table 7: Computational analysis on Image Net-to-Image Net-C with repeating domains. Method Venue Params.(M) Time Mem.(GB) TFLOPs Err Mean Tent ICLR 21 0.03 1.0 5.5 1.08 51.0 Co TTA CVPR 22 86.57 4.7 16.2 3.24 49.9 Vi DA ICLR 24 93.70 35.3 9.3 14.03 43.4 DPCore ICML 25 0.08 1.6 5.7 1.67 39.9 Ours 0.09 1.9 6.0 1.68 34.8 Further Comparison on Efficiency. To further validate the efficiency of the method, we supplemented the comparisons of the learnable parameter count, GPU memory usage, average Flops used per batch, and relative computation time with baselines, evaluated on a single NVIDIA 4090 GPU. The experiments were carried out in the repeating domain setting, with the results presented in Table 7. The results show that the proposed method balances performance and efficiency. When compared to Tent (superior raw efficiency), it gains 15.2% performance with minimal extra cost. Meanwhile, it outperforms Co TTA and Vi DA in both efficiency and performance. Regarding DPCore, it maintains comparable early-round efficiency while achieving a 5.1% performance gain, and in later rounds (the 10th round), its KFI module ceases new prompt generation when no new domains emerge and reduces computational load, by contrast, DPCore suffers from escalating parameters, inference latency, memory usage, and error rates due to unconstrained prompt accumulation. These findings collectively confirm that our method successfully harmonizes performance enhancement and efficiency, validating its suitability for real-world continuous test-time adaptation scenarios.
Software Dependencies	No	The paper mentions the use of an "Adam W optimizer" and that Vi Ts are "loaded from timm", but no specific version numbers for these or other software libraries (e.g., PyTorch, CUDA) are provided.
Experiment Setup	Yes	Implementation Details. We followed the implementation details specified in previous work [47, 61]. We use Vi T-B/16 as our backbone. We utilize the Adam W optimizer with a learning rate 0.1 for domain prompts and 0.001 for class prompts with a batch size b = 64. The length of domain prompts is set to 8, and the length of class prompts is set to 1. Other hyper-parameters γd, γc, γh, αd, αc, τd, τc, a, Nd and Nc are set to 25, 0.005, 2, 0.1, 0.1, 3, 1, 3, 20 and 100. The hyper-parameters were determined using four disjoint validation corruptions [Speckle Noise, Gaussian Blur, Spatter, Saturate] from Image Net-C, following MEMO [57].