Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RrED: Black-box Unsupervised Domain Adaptation via Rectifying-reasoning Errors of Diffusion

Authors: Yuwu Lu, Chunzhi Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments confirm that Rr ED significantly outperforms other methods on four benchmark datasets, demonstrating its effectiveness in enhancing reasoning and generalization abilities. ... To evaluate the effectiveness of Rr ED, we conduct extensive experiments, achieving SOTA performance on four benchmarks. Ablation studies further highlight the contributions of each component and provide a detailed analysis of the relationship among them.
Researcher Affiliation	Academia	Yuwu Lu School of Artificial Intelligence South China Normal University Foshan, Guangdong, China EMAIL Chunzhi Liu School of Artificial Intelligence South China Normal University Foshan, Guangdong, China EMAIL
Pseudocode	Yes	Our pseudocode for the training process is shown in Algorithm 1. In addition, our experimental and main code are available in the Supplementary Material. ... Algorithm 1 Rr ED for BUDA task.
Open Source Code	Yes	Our pseudocode for the training process is shown in Algorithm 1. The implementation details in Appendix F. The experimental code and the main code are available in the Supplementary Materials. ... The experimental code and the main code are available in the Supplementary Materials.
Open Datasets	Yes	Rr ED is evaluated on four widely-used domain adaptation benchmarks. Office-31 [53] is a small-scale dataset with 4,110 images in 31 categories from three domains: Amazon (A), Dslr (D), and Webcam (W). Office-Home [54] is a medium-scale dataset, containing 15.5K images across 65 categories from four domains: Real World (R), Clipart (C), Art (A), and Product (P). Vis DA-17 [55] is a large-scale benchmark, including 152K synthetic images (source) and 55K real-world images (target) across 12 categories, emphasizing the synthetic-to-real domain gap. Domain Net [56] is the most extensive benchmark, with about 600K images.
Dataset Splits	Yes	Vis DA-17 [55] is a large-scale benchmark, including 152K synthetic images (source) and 55K real-world images (target) across 12 categories, emphasizing the synthetic-to-real domain gap. Domain Net [56] is the most extensive benchmark, with about 600K images. Following previous methods [17, 52], the evaluation setup for adaptation scenarios involves merely 4 domains with 126 categories, including Real (R), Clipart (C), Painting (P), and Sketch (S).
Hardware Specification	Yes	We implement our Rr ED based on Py Torch and conduct all experiments using an NVIDIA Ge Force RTX4090 GPU.
Software Dependencies	No	We implement our Rr ED based on Py Torch and conduct all experiments using an NVIDIA Ge Force RTX4090 GPU. ... For fair comparison, the backbone network is initialized following the protocol in [15], employing the Image Net [64] pre-trained Res Net architectures: Res Net-50 for Office-31, Office Home, and Domain Net, and Res Net-101 for Vis DA-17.
Experiment Setup	Yes	The optimization configuration employs SGD with a momentum of 0.9, a weight decay of 1e-3, and differentiated learning rates, where the learning rate is set to 1e-4 for the feature extractor fθ and 1e-3 for the classifier cθ. Following [16, 17], we set the bottleneck dimension to 256, the batch size to 64, the static momentum coefficient µ to 0.6, and the number of warm-up epochs to 3.