Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

Authors: Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that our amortized procedure performs on par with baselines trained specifically for each dataset on both in and out-of-distribution problems, and also outperforms them in scarce data regimes .
Researcher Affiliation	Collaboration	Divyat Mahajan ,1, Jannes Gladrow2, Agrin Hilmkil2, Cheng Zhang2, Meyer Scetbon ,2 1 Mila, Université de Montréal, 2 Microsoft Research
Pseudocode	Yes	A.4 Pseduo Code Algorithm 1 Cond-Fi P Part 1: Dataset Encoder µ(DX, G) ... Algorithm 2 Cond-Fi P Part 2: Conditional Fixed-Point Decoder T (z, DX, G)
Open Source Code	Yes	1The code is available on Github: microsoft/causica.
Open Datasets	Yes	We use the synthetic data generation procedure proposed by Lorch et al. (2022) to generate SCMs... We further evaluate Cond-Fi P on test datasets generated using C-Suite (Geffner et al., 2022)... experiments on real-world instances using the flow cytometry dataset (Sachs et al., 2005) and ecoli dataset (Scutari, 2010).
Dataset Splits	Yes	Test Datasets. We evaluate the model s generalization both in-distribution and out-of-distribution by sampling test datasets from PIN and POUT, respectively... For each SCM we generate ntest = 800 samples, split equally into task context DX and queries DX for evaluation... We split this into context Dcontext X Rncontext d and queries Dquery X Rnquery d, each of size ncontext = nquery = 400.
Hardware Specification	Yes	We trained Cond-Fi P on a single L40 GPU with 48GB of memory, using an effective batch size of 8 with gradient accumulation.
Software Dependencies	No	The paper mentions "Adam optimizer (Paszke et al., 2017)", which points to a PyTorch publication, but no explicit version numbers for PyTorch or other libraries are provided. Therefore, specific ancillary software details with versions are missing.
Experiment Setup	Yes	For both the dataset encoder and cond-Fi P, we set the embedding dimension to dh = 256 and the hidden dimension of MLP blocks to 512. Both of our transformer-based models contains 4 attention layers and each attention consists of 8 attention heads. The models were trained for a total of 10k epochs with the Adam optimizer (Paszke et al., 2017), where we used a learning rate of 1e 4 and a weight decay of 5e 9. Each epoch contains 400 randomly generated datasets.