Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Evaluating Instrument Validity using the Principle of Independent Mechanisms

Authors: Patrick F. Burauel

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Monte Carlo studies show a high accuracy of the procedure. We apply our method to two empirical studies: first, we can corroborate the narrative justification given by Card (1995) for the validity of college proximity as an instrument for educational attainment in his work on the financial returns to education. Second, we cannot reject the validity of past savings rates as an instrument for economic development to estimate its causal effect on democracy (Acemoglu et al., 2008).
Researcher Affiliation	Academia	Patrick F. Burauel EMAIL California Institute of Technology Pasadena, CA, USA
Pseudocode	Yes	The test procedure is succinctly described in Algorithm 1. In the following main text we provide a description that focuses on the intuition. We denote the degree of confounding as measured by the method laid out in Janzing and Sch olkopf (2018a) (JS) in a multivariate linear model with X as independent variables and Y as dependent variable with κ({X}; Y ). Algorithm 1: Test for instrument validity Algorithm 2: Generate synthetic Ts Algorithm 3: Simulation of Violation of Exclusion Restriction Algorithm 4: Simulation of Violation of Exchangeability Assumption
Open Source Code	No	The paper does not provide a specific link to source code or an explicit statement about the release of their implementation code for the methodology described.
Open Datasets	Yes	First, we use data from Card (1995) to test the validity of {proximity to college} as an instrument for {educational attainment} in an effort to estimate the causal effect on {earnings}, see Section 6.1. Second, we apply the test to evaluate the validity of {past saving rates} as an instrument for {economic development} in a study by Acemoglu et al. (2008) that attempts to understand its causal effect on {democratic development}, see Section 6.2. Card uses a sample of roughly 3,500 individuals from the National Longitudinal Surveys of Youth (NLSY, Cooksey, 2018) We use the data provided by Acemoglu et al. (2008) to evaluate the validity of {past savings rates} as an instrument.
Dataset Splits	No	The paper describes how synthetic data is generated for Monte Carlo studies, specifying parameters like the number of observations and covariates. However, for the empirical applications using existing datasets (Card, Acemoglu et al.), it only mentions the total number of observations but does not provide specific training/test/validation splits or other data partitioning methodologies.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper mentions using the 'R package optim' but does not specify a version number or list other key software components with their versions, which is required for reproducibility.
Experiment Setup	No	The paper provides details on how data is generated for Monte Carlo simulations (e.g., number of observations, covariates, variance of errors, bootstrap samples). However, it does not specify concrete hyperparameters (like learning rates, batch sizes, optimizers) or system-level training configurations for their method's implementation on either simulated or real-world data.