Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FIG: Flow with Interpolant Guidance for Linear Inverse Problems

Authors: Yici Yan, Yichi Zhang, XIANGMING MENG, Zhizhen Zhao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS 4.1 EXPERIMENTAL SETUP Datasets. We conduct experiments on 3 natural image datasets: Celeb A-HQ (Karras et al., 2018), LSUN-Bedroom (Yu et al., 2015), and AFHQ-Cat (Choi et al., 2020). Metrics. For the quantitative comparison, we use the perceptual Learned Perceptual Image Patch Similarity (LPIPS) distance (Zhang et al., 2018), along with two standard metrics: peak signal-to-noise-ratio (PSNR), and structural similarity index (SSIM). Table 1: Quantitative comparison (PSNR, SSIM, LPIPS) of different algorithms for different tasks on the Celeb A-HQ 256 × 256 test dataset.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign 2Department of Statistics, University of Illinois at Urbana-Champaign 3ZJU-UIUC Institute, Zhejiang University
Pseudocode	Yes	Algorithm 1 Flow with Interpolant Guidance (FIG) Algorithm 2 FIG+
Open Source Code	Yes	Our code is available at: https://riccizz.github.io/FIG/.
Open Datasets	Yes	Datasets. We conduct experiments on 3 natural image datasets: Celeb A-HQ (Karras et al., 2018), LSUN-Bedroom (Yu et al., 2015), and AFHQ-Cat (Choi et al., 2020).
Dataset Splits	Yes	All images are taken from the official test data splits and are preprocessed to the size of 256 × 256 × 3. ... We conduct experiments on a separate validation set (100 images from the official test data split) using different values of K and report results for K = 1 and the optimal K.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA RTX A6000 GPU for reconstructing one image.
Software Dependencies	No	The paper mentions using pre-trained Rectified Flow models and EDM (Karras et al., 2022) as base models, and discusses baselines like DPS, DMPS, OT-ODE, DDNM/DDNM+, and DAPS. It also references
Experiment Setup	Yes	For super-resolution, we apply 4× bicubic downsampling across all datasets. For deblurring tasks, we use Gaussian blurring with a kernel size of 61 × 61 and a standard deviation of 3.0, and motion blurring with the same kernel size but a standard deviation of 0.5. For inpainting, we perform random inpainting by masking out 90% of the total pixels. ... for all tasks above, we add a measurement Gaussian noise n ∼ N(0, σ2 n I) with σn = 0.05. ... Leveraging the advantages of flow matching models, we fine-tune the baseline methods to ensure they all achieve their best performance at 50 NFEs except for OT-ODE. ... The parameters c, w, K are constants, with c and K being task-specific, governing the balance between unconditional and conditional updates.