Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems

Authors: Noam Elata, Hyungjin Chung, Jong Chul Ye, Tomer Michaeli, Miki Elad

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the FFHQ and Image Net datasets demonstrate state-of-the-art performance. Beyond posterior sampling, we further demonstrate the applicability of our architecture, operating as a general Minimum Mean Square Error predictor, and as a Neural Posterior Principal Component estimator. We evaluate the advantage of Inv Fusion over existing inverse problem solvers on 64 64 images from the FFHQ [29] and Image Net [15] datasets. Our model achieves the best CFID among all training-based and zero-shot methods, along with the best FID in all but a single case, suggesting it is SOTA in generating samples from the posterior for our setting.
Researcher Affiliation	Collaboration	Noam Elata Technion Haifa, Israel EMAIL Hyungjin Chung Ever Ex Seoul, South Korea EMAIL Jong Chul Ye KAIST Daejeon, South Korea EMAIL Tomer Michaeli Technion Haifa, Israel EMAIL Michael Elad Technion Haifa, Israel EMAIL
Pseudocode	Yes	A.2 Pseudo-Code A.2.1 Feature Degradation Layer 1 class Feature Degradation (nn.Module): 2 def __init__(self , channels , patch_size , im_channels =3): 3 super ().__init__ () 4 self.h = patch_size [0] 5 self.w = patch_size [1] 6 self.im_channels = im_channels 7 self.deg_linear = Linear(channels + 1, channels , bias=True) 9 def forward(self , x, degradation , y): 10 x = rearrange(x, 11 "... h w (nh nw k c) -> ... k c (h nh) (w nw)", 12 nh=self.h, nw=self.w, c=self.im_channels) 13 _y = degradation.H(x) 14 _y = torch.cat([y, _y], -2) 15 _y = act(self.deg_linear(_y)) 16 _x = degradation.H_pinv(_y) 17 _x = rearrange(_x , 18 "... k c (h nh) (w nw) -> ... h w (nh nw k c)", 19 nh=self.h, nw=self.w) 20 return _x
Open Source Code	Yes	1Code implementation available at https://github.com/noamelata/Inv Fusion Code implementation available at https://github.com/noamelata/Inv Fusion.
Open Datasets	Yes	Our experiments on FFHQ [29] and Image Net [15] evaluate various approaches for addressing multiple restoration tasks with a single trained model. Our experiments use data from the FFHQ [29] (CC-BY 4.0 license) and Image Net [15] datasets.
Dataset Splits	No	The paper mentions using FFHQ [29] and Image Net [15] datasets for experiments but does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	Yes	We have used 8 Nvidia A40 GPUs (or equivalent hardware) with 49GB of memory for all experiments.
Software Dependencies	No	The paper states: "We implement all models using the official implementation of HDi T [13] Image Transformer Denoiser Model V2 architecture (MIT license)." and "Models are trained and evaluated with the official implementation of the EDM2 [31] training script (licensed as CC BY-NC-SA 4.0)". While specific frameworks are mentioned, the paper does not provide version numbers for underlying key software components such as Python, PyTorch, or CUDA, which are necessary for full reproducibility.
Experiment Setup	Yes	We use default hyperparameters, with changes listed in Tab. 5. The initial patch size is indicated in the table, with the patch size doubling with the progression along the list shown in the Depths column. The attention type column signified which type of attention was used. NAttnx indicates neighborhood attention [23, 24] with a kernel of size x, which performs attention between patches only in an x sized neighborhood. GAttn indicates global attention. ... Models are trained and evaluated with the official implementation of the EDM2 [31] training script (licensed as CC BY-NC-SA 4.0), using the default P_mean = -0.8, P_std = 1.6, and a learning rate of 5e 5, using the default learning rate scheduler. Sampling is done with the default Heun scheduler using a total of 63 NFEs for sampling. Additional experiment specific-training hyperparameters can be found in Tab. 6.