Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Seeing Beyond Labels: Source-Free Domain Adaptation via Hypothesis Consolidation of Prediction Rationale

Authors: Yangyang Shu, Yuhang Liu, Xiaofeng Cao, Qi Chen, Bowen Zhang, Ziqin Zhou, Anton van den Hengel, Lingqiao Liu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that our approach achieves state-of-the-art performance in the SFUDA task and can be easily integrated into existing approaches to improve their performance.
Researcher Affiliation	Academia	Yangyang Shu EMAIL School of Systems and Computing, University of New South Wales Yuhang Liu EMAIL Australian Institute for Machine Learning, The University of Adelaide Xiaofeng Cao EMAIL School of Computer Science and Technology, Tongji University Qi Chen EMAIL Australian Institute for Machine Learning, The University of Adelaide Bowen Zhang EMAIL Australian Institute for Machine Learning, The University of Adelaide Ziqin Zhou EMAIL Australian Institute for Machine Learning, The University of Adelaide Anton van den Hengel EMAIL Australian Institute for Machine Learning, The University of Adelaide Lingqiao Liu EMAIL School of Computer Science, The University of Adelaide
Pseudocode	Yes	Algorithm 1 SFUDA with Hypothesis Consolidation of Prediction Rationale
Open Source Code	Yes	The codes are available at https://github.com/GANPerf/HCPR.
Open Datasets	Yes	Office-Home Venkateswara et al. (2017) consists of 15,500 images categorized into 65 classes. It includes four distinct domains: Real-world (Rw), Clipart (Cl), Art (Ar), and Product (Pr). ... Originally, the Domain Net dataset Peng et al. (2019) consisted of over 500,000 images, including six domains and 345 classes. ... Vis DA-C Peng et al. (2017) contains 152,000 synthetic images from the source domain and 55,000 real object images from the target domain.
Dataset Splits	Yes	Office-Home Venkateswara et al. (2017) consists of 15,500 images categorized into 65 classes. It includes four distinct domains: Real-world (Rw), Clipart (Cl), Art (Ar), and Product (Pr). To evaluate the proposed method, researchers perform 12 transfer tasks on this dataset, involving adapting models across the four domains. The evaluation reports each domain shift Top-1 and the average Top-1 accuracy. Originally, the Domain Net dataset Peng et al. (2019) consisted of over 500,000 images, including six domains and 345 classes. For our evaluation, we follow the approach described in Saito et al. (2019) and focus on four domains: Real World (Rw), Sketch (Sk), Clipart (Cl), and Painting (Pt). We assess our proposed method on seven domain shifts within these four domains. Vis DA-C Peng et al. (2017) contains 152,000 synthetic images from the source domain and 55,000 real object images from the target domain. It consists of 12 object classes, and there is a significant synthetic-to-real domain gap between the two domains. Our evaluation reports per-class Top-1 accuracies, as well as the average Top-1 accuracy on this dataset.
Hardware Specification	Yes	The running time is measured on 1 Tesla A100 GPU with 40 epochs.
Software Dependencies	No	The paper mentions using 'Res Net-50/101' as the network backbone and 'SGD optimizer', but does not provide specific version numbers for these software components or any other libraries used.
Experiment Setup	Yes	In the first step of model pre-adaptation, we use a batch size of 64. The value of λ is set as λ = λ0 (1+10 p ) 5, where λ0 = 1, and p represents the training progress variable ranging from 0 to 1, calculated as iter max_iter. In the second step of hypothesis consolidation, we set the number of nearest/furthest neighbor per instance z as 3, and set hypothesis per instance k as 4, respectively. The ranking thresholds τ1 and τ2 are determined as a percentage of the total number of samples on the three datasets, specifically set at 0.8% and 1.6%. In the third step of semi-supervised learning, we set the size of Bl and Bu to 64. We use the SGD optimizer with a momentum of 0.9 and a weight decay of 1e 3 for all datasets. The learning rate is set as 1e 4 for all datasets, except for the bottleneck layer and the additional fully connected layer, where it is set as 1e 3. We train for 40 epochs on the Office-Home and Domain Net datasets, where 9 epochs are dedicated to the model pre-adaptation. For the Vis DA-C dataset, we train for 15 epochs, with 7 epochs allocated for the model pre-adaptation. ... τ is the threshold defined in Fix Match to identify reliable pseudo-label (we set the same with Fix Match as 0.95)