Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CounterFactual Regression with Importance Sampling Weights

Authors: Negar Hassanpour, Russell Greiner

IJCAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on two publicly available benchmarks demonstrate that the proposed method signiﬁcantly outperforms state-of-the-art. 4 Experiments. Table 1: ENo RMSE, PEHE, and ϵATE performance measures (lower is better), each of the form mean (standard deviation) on the IHDP benchmark. Table 2: Aggregated ENo RMSE (lower is better) on the ACIC 18 benchmark.
Researcher Affiliation	Academia	Negar Hassanpour and Russell Greiner Department of Computing Science, University of Alberta, Canada EMAIL
Pseudocode	Yes	Algorithm 1 CFR-ISW: Counter Factual Regression with Importance Sampling Weights
Open Source Code	No	The paper mentions in the Acknowledgements: 'We wish to thank Dr. Martha White and Junfeng Wen for fruitful conversations, and Dr. Fredrik Johansson for publishing/maintaining the code-base for the CFR method online.' This refers to the code for the CFR method (a baseline), not the code for the authors' proposed CFR-ISW method.
Open Datasets	Yes	To make performance comparison easier, however, we do not synthesize our own datasets here. Instead, we use two publicly available benchmarks see Sec. 4.3. Infant Health and Development Program (IHDP) [...] We worked with the same dataset provided by and used in [Shalit et al., 2017; Johansson et al., 2016; Johansson et al., 2018]. Atlantic Causal Inference Conference 2018 (ACIC 18) [...] covariates matrix for each of these datasets are sub-sampled from a covariates table of real-world medical measurements taken from the Linked Birth and Infant Death Data (LBIDD) [Mac Dorman and Atkinson, 1998].
Dataset Splits	Yes	We report the methods performances by averaging over 100 realizations of outcomes with 63/27/10 train/validation/test splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions 'gradient descent optimizer', 'Adam optimizer', 'elu as the non-linear activation function', and 'Maximum Mean Discrepancy (MMD)' but does not provide specific version numbers for any software libraries, frameworks, or tools used (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	Hyperparameter Selection. Table 3: Hyperparameters and ranges (e.g., Imbalance parameter α {1E{-2, -1, 0, 1}}, Num. of representation layers {3, 5}, Batch size {100, 300}). We trained CFR-ISW s π0 logistic regression function with gradient descent optimizer and a learning rate of 1E-3. For both CFR and CFR-ISW, we trained the Φ and ht networks with regularization coefﬁcient λ=1E-3, elu as the non-linear activation function, Adam optimizer [Kingma and Ba, 2015], learning rate of 1E-3, and maximum number of iterations of 3000.