Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Instrumental Variable from Data Fusion for Treatment Effect Estimation

Authors: Anpeng Wu, Kun Kuang, Ruoxuan Xiong, Minqin Zhu, Yuxuan Liu, Bo Li, Furui Liu, Zhihua Wang, Fei Wu

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate the advantages of our Meta-EM compared with state-of-the-art methods. and Empirical results demonstrate the advantages of the GIV compared with SOTA methods.
Researcher Affiliation	Collaboration	Anpeng Wu1, Kun Kuang1*, Ruoxuan Xiong2, Minqing Zhu1, Yuxuan Liu1, Bo Li3, Furui Liu4, Zhihua Wang5,6, Fei Wu1,5,6 1 Department of Computer Science and Technology, Zhejiang University 2 Department of Quantitative Theory and Methods, Emory University 3 School of Economics and Management, Tsinghua University 4 Huawei Noah s Ark lab 5 Shanghai AI Laboratory 6 Shanghai Institute for Advanced Study of Zhejiang University
Pseudocode	No	The paper describes the algorithm steps in narrative text and diagrams (Figure 2), but does not present a structured pseudocode block or a section explicitly labeled 'Pseudocode' or 'Algorithm X'.
Open Source Code	Yes	The project page with the code and the Supplementary materials is available at https://github.com/causal-machinelearning-lab/meta-em.
Open Datasets	Yes	Similar to previous methods(Nie et al. 2020; Hartford et al. 2017; Bica, Jordon, and van der Schaar 2020; Schwab et al. 2020), we perform experiments on two real-world datasets IHDP4 (Shalit, Johansson, and Sontag 2017) & PM-CMR5 (Wyatt et al. 2020), as the true effect function is rarely available for real-world data. Then we use the continuous variables from IHDP & PM-CMR to replace the covariates X in Eq. (14)&(16) to generate treatment T and outcome Y , respectively. Both two datasets are randomly split into training (63%), validation (27%), and testing (10%). Footnote: 4IHDP: https://www.fredjo.com/ and 5PM-CMR:https://pasteur.epa.gov/uploads/10.23719/1506014/SES PM25 CMR data.zip
Dataset Splits	Yes	Both two datasets are randomly split into training (63%), validation (27%), and testing (10%).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers.
Experiment Setup	Yes	For synthetic datasets, we sample 3,000 units and perform 10 independent replications to report mean squared error (MSE) and standard deviations of the individual treatment effect estimation over the testing data (3000 units) that we intervene the treatment as T = do(t). To verify the effectiveness of GIVEM in different scenarios with different dimensions of covariates m X and different group numbers K, we use Data-K-m X to denote the different scenarios. In this paper, we set the representation dimension as m R = m X.