Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Counterfactual Image Editing with Disentangled Causal Latent Space

Authors: Yushu Pan, Elias Bareinboim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically validate our theoretical results (Thm.2) and demonstrate the effectiveness of BD-CLS-Edit (Alg.1). Additional experimental details are in App. D. 5.1 Colored MNIST and Bars 5.2 Real World Scenarios Counterfactual Editing
Researcher Affiliation	Academia	Yushu Pan and Elias Bareinboim Causal Artificial Intelligence Lab Columbia University EMAIL
Pseudocode	Yes	Algorithm 1: BD-CLS-Edit
Open Source Code	Yes	The code is released at https://github.com/tree1111/BD-CLS-Edit.
Open Datasets	Yes	We first evaluate the guarantees provided by BD-CLS (Thm.2) on a modified MNIST dataset [14, 36] featuring colored digits and bars.
Dataset Splits	No	For each scenario, the underlying ASCM can be illustrated by a specific text. To illustrate, ... We construct data from the observed distribution P(V, I) and initial images i for editing tasks in three steps: ... 3. Select editing inputs. Collect a set of images that all contain a specific feature x to serve as initial inputs. Following this procedure, the resulting observed distribution P(V, I) is guaranteed to be compatible with GV,I. In addition, all initial images share the same original feature values, which differ from the intervened ones (x = x ) assumed in this work. ... We edit 120 images for each real-world task and report LPIPS [62] to quantify the degree of structure preservation.
Hardware Specification	Yes	We train the model on a single NVIDIA H100 GPU epoch for 100 epoch. ... The experiments are also conducted on a single NVIDIA H100 GPU.
Software Dependencies	No	The generator and discriminator are trained with a learning rate of 10 4, and are optimized with Adam optimizer [27]. All training processes are performed with a batch size of 100. The model architecture of conditional diffusion follows the implementation in [20]. Specifically, we use four feature map resolutions (32 32 to 4 4). ... We use Stable Diffusion XL[43], and all editing is performed in the latent space, after encoding the input image.
Experiment Setup	Yes	For classifier-free guidance, we fix the parameter ω (Eq. 18) is fixed as 7.5. Other hyperparameters in Alg. 3 are given as follows. The total inference steps are set to 200. T of length 40 is randomly sampled from {1, ..200}. We manual Ly tune the hyperparamters for BD-CLS-Edit, including learning rate γ, optimization iteration number nmax, and clipping value θmax. Specifically, we compare the combination of γ {1e 1, 1e 2, 1e 3, 1e 4}, nmax [2, 10, 20], and θmax {1.2, 1.5, 2.0}. We found that γ = 1e 2 and nmax = 10 lead to the best BD-CLS-Edit s performance considering both effectiveness and optimization time. The BD-CLS-Edit is relatively robust to θmax. The initial θ is set to 0 for θT through θT 50 and the others are initialized as 1. θ are optimized individually for each input image. The adjusted parameters follow the coefficients in DDS [18]. The learning rate µ is set as 0.1 and the optimization is performed with SGD.