reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Counterfactual Identification Under Monotonicity Constraints

Authors: Aurghya Maiti, Drago Plecko, Elias Bareinboim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, in Sec. 4, we demonstrate our methods on both synthetic and real data (analyzing the data from (Abadie 2003), as mentioned in the example), showcasing their practical utility.
Researcher Affiliation	Academia	Causal Artificial Intelligence Laboratory, Columbia University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: M-ID Input: Causal graph with monotonicity constraints h G, Mi, set of counterfactual terms X = x , Y = y , available distributions Z. Output: P(Y = y \| X = x ) in terms of available distributions
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available or in supplementary materials.
Open Datasets	Yes	To demonstrate this, we use the 401(k) dataset, which is a sample of ﬁnancial data of individuals drawn from the 1991 Survey of Income and Program Participation (SIPP). ... (Abadie 2003; Chernozhukov et al. 2018) have studied the LATE of 401(k) participation on net ﬁnancial assets by using eligibility as an instrument.
Dataset Splits	No	The paper mentions discretizing data into quartiles and generating 30,000 or 10,000 data points from a synthetic SCM, but it does not specify explicit training/test/validation splits or their proportions for experimental reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud computing instance types) used for running the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, or a particular solver with its version, which are needed to replicate the experiment.
Experiment Setup	No	The paper describes how synthetic SCMs were designed and data points were generated for the experiments. It details variable discretizations (e.g., into quartiles) and the number of sampled data points (30,000 or 10,000). However, it does not provide specific hyperparameters or system-level training settings typically associated with machine learning models or algorithms, such as learning rates, batch sizes, optimizers, or epochs.