Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Training with Dynamic Weighting for Robust Gradual Domain Adaptation

Authors: Zixi Wang, Yushe Cao, Yubo Huang, Jinzhu Wei, Jingzehua Xu, Shuai Zhang, Xin Lai

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on rotated MNIST, color-shifted MNIST, portrait datasets, and the Cover Type dataset demonstrate that STDW outperforms existing baselines. Ablation studies further validate the critical role of ϱ s dynamic scheduling in achieving progressive adaptation, confirming its effectiveness in reducing domain bias and improving generalization.
Researcher Affiliation	Collaboration	Zixi Wang University of Electronic Science and Technology of China Chengdu, Sichuan, China Yushe Cao Tsinghua University Beijing, China Yubo Huang Zhenguan AI Lab Shenzhen, Guangdong, China Southwest Jiaotong University Chengdu, Sichuan, China Jinzhu Wei Shanghai University Shanghai, China EMAIL Jingzehua Xu Tsinghua University Beijing, China EMAIL Shuai Zhang New Jersey Institute of Technology Newark, NJ, United States Xin Lai Southwest Jiaotong University Chengdu, Sichuan, China
Pseudocode	Yes	Algorithm 1 Self-Training with Dynamic Weighting (STDW) 1: Input: Source batches {B0,k}m k=1, domain sequence {Dt}n t=1, initial model f (0,0), Inter-domain migration steps: s 2: Output: Adapted model f (n,m)
Open Source Code	Yes	The code is available at https://github.com/Dramwig/STDW.
Open Datasets	Yes	Following established gradual domain adaptation protocols [6, 9], we utilize four core datasets: Rotated MNIST [18] and Color-Shift MNIST for controlled synthetic transformations, Portraits Dataset [19] for real-world temporal shifts, and Cover Type Dataset [20] for tabular domain adaptation. To further stress-test our method under severe distribution shifts, we incorporate two additional corruption benchmarks: CIFAR-10-C and CIFAR100-C [21]
Dataset Splits	No	The paper mentions utilizing well-known datasets such as Rotated MNIST, Color-Shift MNIST, Portraits Dataset, Cover Type Dataset, CIFAR-10-C, and CIFAR100-C, and refers to "established gradual domain adaptation protocols [6, 9]". While these protocols often imply standard splits, the main text of the paper does not explicitly provide specific train/test/validation split percentages or sample counts for these datasets.
Hardware Specification	Yes	All experiments are conducted on NVIDIA RTX 4090 GPUs with identical random seeds for reproducibility.
Software Dependencies	No	The paper mentions using "Re LU activations, batch normalization [22], and dropout regularization [23], optimized using Adam [24]" but does not provide specific version numbers for these software components or any underlying frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	For image datasets (Rotated MNIST, Color-Shift MNIST, Portraits), we implement a convolutional neural network comprising three 32-channel convolutional layers followed by two 256-unit fully connected layers. The tabular Cover Type dataset utilizes a progressively expanding fully connected architecture (128-256-512 units). All models incorporate Re LU activations, batch normalization [22], and dropout regularization [23], optimized using Adam [24] with carefully tuned hyperparameters. ... For the corruption benchmarks, we employ established robust architectures: Wide Res Net-28 [25] for CIFAR-10-C and Res Ne Xt-29 [26] for CIFAR-100-C, following Robust Bench protocols [27, 28] to ensure comparable evaluation conditions.