reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion Counterfactual Generation with Semantic Abduction

Authors: Rajat R Rasal, Avinash Kori, Fabio De Sousa Ribeiro, Tian Xia, Ben Glocker

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present three case studies using our mechanisms for counterfactual image generation 1. We begin with a toy scenario where we control the true causal data-generating process, and progressively scale up our mechanisms for causal face modelling and a novel medical artefact removal problem. We compare our mechanisms against VAE, HVAE and diffusion-based alternatives (Pawlowski et al., 2020; De Sousa Ribeiro et al., 2023; Wu et al., 2024; Sanchez & Tsaftaris, 2022) using counterfactual soundness metrics. Table 1 reports the counterfactual soundness results for simple DSCMs modelling only the mechanism d x, assessed under random interventions do(d).
Researcher Affiliation	Academia	1Department of Computing, Imperial College London, UK.
Pseudocode	Yes	Algorithm 1 Counterfactual Trajectory Alignment
Open Source Code	Yes	We present three case studies using our mechanisms for counterfactual image generation 1. 1https://github.com/Rajat Rasal/Diffusion-Counterfactuals
Open Datasets	Yes	Morpho-MNIST (Castro et al., 2019) dataset... Celeb A-HQ (Karras, 2017)... EMory Br East imaging Dataset (EMBED) (Jeong et al., 2022).
Dataset Splits	Yes	Table 4. Network architecture of our semantic mechanisms. PARAMETER MORPHO-MNIST CMORPHO-MNIST CELEBA CELEBA-HQ EMBED TRAINING SET 50000 50000 162770 24000 13207 VALIDATION SET 10000 10000 19867 3000 3300 TEST SET 10000 10000 19962 3000 5503
Hardware Specification	Yes	dynamic semantic abduction requires 3 minutes per image, compared to 3 minutes and 3.5 minutes for the guided spatial and semantic mechanisms, respectively, using a batch size of 128 on an NVIDIA Ge Force RTX 4090.
Software Dependencies	No	The paper mentions "from torch import nn" and "from torchvision.models import resnet50", indicating the use of PyTorch and torchvision. It also mentions "Adam (Kingma, 2014) optimiser". However, no specific version numbers for these software components are provided, which is necessary for reproducibility.
Experiment Setup	Yes	Table 4. Network architecture of our semantic mechanisms. BATCH SIZE 128 EPOCHS 1000 LEARNING RATE 1e-4 OPTIMISER ADAM (NO WEIGHT DECAY) EMA DECAY FACTOR 0.9999 TRAINING T 1000 DIFFUSION LOSS MSE WITH NOISE PREDICTION