Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Data-Driven Selection of Instrumental Variables for Additive Nonlinear, Constant Effects Models
Authors: Xichen Guo, Feng Xie, Yan Zeng, Hao Zhang, Zhi Geng
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both synthetic and two real-world datasets demonstrate the effectiveness and robustness of our proposed approach, highlighting its potential for broader applications in causal analysis. |
| Researcher Affiliation | Academia | 1 Department of Applied Statistics, Beijing Technology and Business University, Beijing, China 2 SIAT, Chinese Academy of Sciences, Shenzhen, China. Correspondence to: Feng Xie <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 CAT |
| Open Source Code | Yes | The source code is available in the Supplementary Material. |
| Open Datasets | Yes | Colonial Origins Data (Acemoglu et al., 2001). Children and Mothers Labor Supply Data (Angrist & Evans, 1996). |
| Dataset Splits | No | The paper describes using synthetic data with specified sample sizes (1k, 3k, 5k) and for real-world data, mentions randomly selecting 5% of the data for testing with averages over 10 repeated tests for one dataset, but it does not specify explicit train/test/validation splits for model training and evaluation. |
| Hardware Specification | Yes | All experiments were performed with Intel 2.90 GHz and 2.89 GHz CPUs and 128 GB of memory. |
| Software Dependencies | No | The paper mentions several R packages (Robust IV, CIIV, sisVIVE) and their availability for *comparison methods* (TSHT, CIIV, sis VIVE, MR-Egger) but does not provide specific version numbers for the software used to implement the proposed CAT algorithm. |
| Experiment Setup | No | The paper describes data generation mechanisms for synthetic data (including noise distributions and coefficient ranges) and mentions some details about real-world data processing (e.g., sample selection, variable definitions). However, it does not explicitly detail hyperparameters, optimizer settings, or other system-level training configurations for their proposed algorithm. |