Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Automatic Visual Instrumental Variable Learning for Confounding-Resistant Domain Generalization

Authors: Fuyuan CAO, Shichang Qiao, Kui Yu, Jiye Liang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple benchmarks verify that VIV-DG achieves superior generalization ability. 5 Experiments 5.1 Datasets and settings 5.2 Experimental results
Researcher Affiliation	Academia	1School of Computer and Information Technology, Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, China 2Shanxi Taihang Laboratory, Taiyuan, China 3School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
Pseudocode	Yes	The pseudocode for VIV-DG is presented in Appendix F. Algorithm 1 Visual Instrumental Variables for Domain Generalization (VIV-DG)
Open Source Code	No	The datasets involved in this research are all publicly available real-world datasets, and we plan to release the code.
Open Datasets	Yes	We evaluate VIV-DG on several real-world benchmarks: Digits-DG [26], PACS [27], Office-Home [28], and VLCS [29]. The datasets involved in this research are all publicly available real-world datasets, and we plan to release the code.
Dataset Splits	Yes	For Digits-DG, we randomly select 600 images per class in each domain, using 80% for training and 20% for validation. For fair comparison, we use the original training-validation split provided by [27]. Following [31], we use 90% of the data for training and 10% for validation.
Hardware Specification	Yes	In our experiments, we used an NVIDIA A800 80GB PCIe GPU.
Software Dependencies	No	The paper mentions using Image Net-pretrained Res Net-18 and Res Net-50 [51] backbones and an SGD optimizer, but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The hyperparameter settings are summarized in Table 6, encompassing the various datasets and training stages employed in our experiments. Phase I (Initial causal factor extraction): The loss function Lcausal incorporates two hyperparameters: β1 and τ. β1 is set per dataset (Digits-DG = 1, PACS = 0.5, Office-Home = 0.1, VLCS = 0.1), while τ adapts dynamically according to the number of training epochs. This scheme ensures rapid convergence across datasets and stabilizes causal factor extraction in early training. Phase II (Visual IV learning) & Phase III (Regressor training): To reduce tuning complexity, Phases II and III adopt identical hyperparameter settings across all datasets. Specifically, in Phase II s total loss Ltotal_IV, we set α1 = 1, α2 = 0.5, and α3 = 0.5. In Phase III, the regressor loss Lreg carries a default weight of 1 to maintain stability. Phase I+ (Causal factor refinement and debiasing): The loss L+ causal reuses β1 and τ from Phase I while introducing a dynamically adjusted β2. This adaptive strategy enables the model to flexibly respond to distributional shifts and further refine debiasing.