Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Identifiable Causal Inference with Noisy Treatment and No Side Information
Authors: Antti Pöllänen, Pekka Marttinen
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate the method s good performance with unknown measurement error. More broadly, our work extends the range of applications in which reliable causal inference can be conducted. We evaluate our algorithm on a wide variety of synthetic datasets, as well as semi-synthetic data. |
| Researcher Affiliation | Academia | Antti Pöllänen EMAIL Department of Computer Science Aalto University Pekka Marttinen EMAIL Department of Computer Science Aalto University |
| Pseudocode | Yes | Algorithm 1 Generation of synthetic datasets using GPs |
| Open Source Code | Yes | The algorithm was implemented in Py Torch, with code available for replicating the experiments at https://github.com/antti-pollanen/ci_noisy_treatment. |
| Open Datasets | Yes | We also test CEME with semisynthetic data based on a dataset curated by Card (1995) from data from the National Longitudinal Survey of Young Men (NLSYM), conducted between years 1966 and 1981. |
| Dataset Splits | Yes | The different training dataset sizes used are 1000, 4000, and 16000 data points. The test data (used for evaluating the models) consist of 20000 data points. [...] The full data of 2990 points is split into 72% of training data, 8% of validation data (used for learning rate annealing and early stopping) and 20% of test data (used for evaluating the models), all amounts rounded to the nearest integer. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The algorithm was implemented in Py Torch, with code available for replicating the experiments at https://github.com/antti-pollanen/ci_noisy_treatment. (No version specified for PyTorch or any other software dependencies). |
| Experiment Setup | Yes | Further training details are available in Appendix B. The hyperparameter values used are listed in Table 1. They were optimized using a random parameter search. [...] The hyperparameter values used are listed in Table 2. The hyperparameters are shared by all algorithms and were optimized using a random search. |