Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Conditional Generative Models are Sufficient to Sample from Any Causal Effect Estimand
Authors: Md Musfiqur Rahman, Matt Jordan, Murat Kocaoglu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on a Colored MNIST dataset having both the treatment (X) and the target variables (Y ) as images and sample from P(y|do(x)). Our algorithm also enables us to conduct a causal analysis to evaluate spurious correlations among input features of generative models pre-trained on the Celeb A dataset. Finally, we generate high-dimensional interventional samples from the MIMIC-CXR dataset involving text and image variables. |
| Researcher Affiliation | Academia | Md Musfiqur Rahman Purdue University Matt Jordan University of Texas at Austin Murat Kocaoglu Purdue University |
| Pseudocode | Yes | Algorithm 1 ID-GEN (Y, X, G, D, ˆX, ˆG) |
| Open Source Code | Yes | Codes are available at github.com/musfiqshohan/idgen. |
| Open Datasets | Yes | We conduct experiments on a Colored MNIST dataset having both the treatment (X) and the target variables (Y ) as images and sample from P(y|do(x)). Our algorithm also enables us to conduct a causal analysis to evaluate spurious correlations among input features of generative models pre-trained on the Celeb A dataset. Finally, we generate high-dimensional interventional samples from the MIMIC-CXR dataset involving text and image variables." and "[29] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015." and "[24] Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019. |
| Dataset Splits | Yes | Finally, a random split of the 30K images is performed: keeping 20K to be used during training, and 10K to be used as validation images. |
| Hardware Specification | Yes | We performed some of our experiments on a machine with an RTX-3090 GPU. We also performed some training on 2 A100 GPU s which took roughly 9 hours for 1000 epochs. |
| Software Dependencies | Yes | For reproducibility purposes, we provide our anonimized source codes with instructions. |
| Experiment Setup | Yes | Batch sizes of 256 are used everywhere. Training is performed for 1000 epochs, which takes roughly 9 hours on 2 A100 GPU s. Sampling is performed using DDIM over 100 timesteps, with a conditioning weight of w = 1 (true conditional sampling) and noise σ = 0.3. |