Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion

Authors: Beibei Lin, Tingting Chen, Robby Tan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that Geo Complete achieves a 17.1% PSNR improvement over state-of-the-art methods, significantly boosting geometric accuracy while maintaining high visual quality. Extensive Experiments: Geo Complete significantly outperforms existing methods in both structural accuracy and visual fidelity.
Researcher Affiliation	Academia	Beibei Lin Tingting Chen Robby T. Tan National University of Singapore EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology through text and figures, such as Figure 2 and Figure 3, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The code will be publicly available upon acceptance of this paper, along with instructions for reproducing the main experimental results.
Open Datasets	Yes	In our experiments, we follow the evaluation protocol of [40] and test on two challenging referencebased image completion datasets: Real Bench and Qual Bench.
Dataset Splits	No	The paper states it evaluates on 'Real Bench' and 'Qual Bench' datasets, and that it fine-tunes the model 'For each scene'. However, it does not explicitly provide information on how these datasets are split into training, validation, and testing sets for the overall model training and evaluation.
Hardware Specification	Yes	All experiments are conducted on a server equipped with four NVIDIA GPUs, each with 24 GB of memory.
Software Dependencies	Yes	In Lang SAM, we employ SAM 2.1-Large [32] for segmentation. ... Our diffusion model is built upon Stable Diffusion 2 Inpainting [37].
Experiment Setup	Yes	For each scene, we fine-tune the model for 2,000 iterations with a batch size of 16. During training, all reference and target images are resized to a resolution of 512 x 512. The Lo RA rank is set to 8 to balance adaptation capacity and training efficiency.