Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Parameter Dynamics of Online Machine Learning and Test-time Adaptation

Authors: Jae-Hong Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across diverse models, datasets, and adaptation scenarios show that SIGMA consistently enhances the performance of state-of-the-art TTA methods, highlighting the critical role of parameter dynamics in ensuring robust adaptation.
Researcher Affiliation	Academia	Jae-Hong Lee Division of Language & AI Hankuk University of Foreign Studies Seoul, Republic of Korea EMAIL
Pseudocode	Yes	Illustrations of the algorithm and pseudocode are provided in Appendix B.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We use publicly available, open-access pre-trained models and datasets for all of our experiments. Additionally, we provide pseudocode for the proposed algorithm in Appendix B.1 to facilitate reproducibility and implementation. (The justification mentions pseudocode but not the actual implementation code for the methodology, which is required for a 'Yes' answer.)
Open Datasets	Yes	Datasets We evaluated SIGMA across both multi-domain and single-domain datasets. For multi-domain adaptation, we used Image Net-C and D109; for single-domain evaluation, we used Rendition [22] and Sketch [21].
Dataset Splits	Yes	Image Net-C extends the original Image Net dataset, consisting of 1,281,167 training images and 50,000 test images, by applying 15 types of corruption (e.g., Gaussian noise, shot noise, defocus blur, frost, JPEG compression) at five severity levels. Following standard practice [45, 29, 43], we used severity level 5 and treated each corruption type as a distinct domain. (...) In the Correlated Input setting, domain-wise data were presented to the model in a fixed sequence, simulating temporally evolving input distributions [6, 62, 66].
Hardware Specification	Yes	All experiments were conducted using a single NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies	No	We implemented the Kolmogorov Smirnov test via the scipy.stats.kstest and the Nelder Mead simplex method via the scipy.stats.fit. (Specific version numbers for software dependencies are not provided.)
Experiment Setup	Yes	We used SGD with a momentum of 0.9 with the source-parameter averaging [46, 43, 31] and a batch size of 64 as the base optimizer. Learning rates were set to 1.0 10 5 for D2V, 2.5 10 4 for Vi T and Swin, and 1.0 10 3 for SAR (using the SAM optimizer [13]) across both Vi T and Swin models. The SIGMA alignment strength λ was fixed per method and held constant across experiments unless otherwise noted: λ = 5.0 10 5 for TENT, 7.5 10 5 for EATA and ROID, and 5.0 10 5 for De YO.