Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing

Authors: Sung-Hoon Yoon, Minghan Li, Gaspard Beaudouin, Congcong Wen, Muhammad Rafay Azhar, Mengyu Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our method outperforms existing zero-shot editing approaches in terms of semantic fidelity and attribute disentanglement. We evaluate our method on Prompt-based Image Editing (PIE) Benchmark [10], which contains 700 images of natural and artificial scenes. To demonstrate the effectiveness of the proposed Split Flow, we conducted experiments as shown in Table 1.
Researcher Affiliation	Academia	1Harvard AI and Robotics Lab, Harvard University 2École des Ponts, Institut Polytechnique de Paris 3New York University Abu Dhabi 4Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University
Pseudocode	No	A detailed algorithmic description of Split Flow is included in the supplementary material.
Open Source Code	Yes	The code is available at https://github.com/Harvard-AI-and-Robotics-Lab/Split Flow. We provide open access to code (github: https://github.com/ Harvard-AI-and-Robotics-Lab/Split Flow.), with scripts to run the code.
Open Datasets	Yes	Dataset. We evaluate our method on Prompt-based Image Editing (PIE) Benchmark [10], which contains 700 images of natural and artificial scenes.
Dataset Splits	No	The paper mentions evaluating on the PIE-Bench benchmark [10], which contains 700 images, but does not specify how these images are split into training, validation, or test sets for reproducibility within the main text.
Hardware Specification	No	Additional details on the computational cost, including memory usage and GPU specifications, are provided in the supplementary materials.
Software Dependencies	No	The paper mentions using Stable Diffusion (SD3, SD3.5) [4] and Mistral-7B [9], which are models, but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Implementation Details. To show the effectiveness of the proposed method and for a fair comparison with the prior works, we employed the commonly used Stable Diffusion (SD3, SD3.5) [4] rectified flow model. By following the protocol of the baseline [11], we use the same T = 50 steps with ηmax = 33, which skips the first one-third steps. The CFG values for the source and target are set to 3.5 and 13.5, respectively. The decomposition ηdec is set to 28, which lasts for 5 steps (ηmax ηdec).