Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ABC3: Active Bayesian Causal Inference with Cohn Criteria in Randomized Experiments

Authors: Taehun Cha, Donghun Lee

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on real-world data sets, ABC3 achieves the highest efficiency, while empirically showing the theoretical results hold. In this section, we empirically analyze the theoretical results introduced in Section 4. For the comparison, we utilize IHDP (Brooks-Gunn, Liaw, and Klebanov (1992) and Hill (2011)), Boston (Harrison and Rubinfeld 1978), ACIC (Gruber et al. 2019), and Lalonde (La Londe 1986) data sets.
Researcher Affiliation	Academia	Taehun Cha and Donghun Lee* Korea University EMAIL
Pseudocode	Yes	Algorithm 1: ABC3 Input: Current time step t, whole covariate set XΩ, covariates distribution P, previous observations X1 t and X0 t , kernel k, noise parameter σϵ Output: xt+1, at+1 1: V0, V1 = ϕ, ϕ 2: for x XΩ\ (X0 t S X1 t ) do 3: Compute k1 t+1 and k0 t+1 assuming xt+1 = x 4: v0, v1 = Equation (1) for each a {0, 1} 5: V0 = V0 S{v0}, V1 = V1 S{v1} 6: end for 7: i, at+1 = arg max(V0\|\|V1) 8: xt+1 = XΩ\ (X0 t S X1 t ) [i] 9: return xt+1, at+1
Open Source Code	Yes	Code https://github.com/AIML-K/Active Bayesian Causal/
Open Datasets	Yes	For the comparison, we utilize IHDP (Brooks-Gunn, Liaw, and Klebanov (1992) and Hill (2011)), Boston (Harrison and Rubinfeld 1978), ACIC (Gruber et al. 2019), and Lalonde (La Londe 1986) data sets.
Dataset Splits	Yes	We randomly divide each data set in half for every trial to construct train and test data sets.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or specific machine configurations) are provided in the paper. The paper only mentions that "most policies require less than 1 second to sample the whole data set" which refers to performance, not specific hardware.
Software Dependencies	No	The paper mentions "We optimize the kernel hyperparameters using scikit-learn package." but does not provide a specific version number for scikit-learn or any other software dependencies.
Experiment Setup	Yes	We apply feature-wise normalization and y-standardization for all regressors. (except Leverage, which requires item-wise normalization) We fit two Gaussian process models with Constant Kernel * Radial Basis Function (RBF) Kernel + White Kernel. We optimize the kernel hyperparameters using scikit-learn package. All the uncertainty-aware policies (ABC3, Mackay and ACE) use the Gaussian process to quantify the uncertainty. For the uncertainty-quantifying kernels, we use RBF kernel with length scale 1.0 with σ2 ϵ = 1. (We check the hyperparameter sensitivity in Section 5.4.)