Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Identifiable Causal Inference with Noisy Treatment and No Side Information

Authors: Antti Pöllänen, Pekka Marttinen

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate the method s good performance with unknown measurement error. More broadly, our work extends the range of applications in which reliable causal inference can be conducted. We evaluate our algorithm on a wide variety of synthetic datasets, as well as semi-synthetic data.
Researcher Affiliation	Academia	Antti Pöllänen EMAIL Department of Computer Science Aalto University Pekka Marttinen EMAIL Department of Computer Science Aalto University
Pseudocode	Yes	Algorithm 1 Generation of synthetic datasets using GPs
Open Source Code	Yes	The algorithm was implemented in Py Torch, with code available for replicating the experiments at https://github.com/antti-pollanen/ci_noisy_treatment.
Open Datasets	Yes	We also test CEME with semisynthetic data based on a dataset curated by Card (1995) from data from the National Longitudinal Survey of Young Men (NLSYM), conducted between years 1966 and 1981.
Dataset Splits	Yes	The different training dataset sizes used are 1000, 4000, and 16000 data points. The test data (used for evaluating the models) consist of 20000 data points. [...] The full data of 2990 points is split into 72% of training data, 8% of validation data (used for learning rate annealing and early stopping) and 20% of test data (used for evaluating the models), all amounts rounded to the nearest integer.
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The algorithm was implemented in Py Torch, with code available for replicating the experiments at https://github.com/antti-pollanen/ci_noisy_treatment. (No version specified for PyTorch or any other software dependencies).
Experiment Setup	Yes	Further training details are available in Appendix B. The hyperparameter values used are listed in Table 1. They were optimized using a random parameter search. [...] The hyperparameter values used are listed in Table 2. The hyperparameters are shared by all algorithms and were optimized using a random search.