Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Increasing the Utility of Synthetic Images through Chamfer Guidance

Authors: Nicola DallAsen, Xiaofeng Zhang, Reyhane Askari Hemmat, Melissa Hall, Jakob J. Verbeek, Adriana Romero-Soriano, Michal Drozdzal

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct our ablation studies on LDM1.5, using the Image Net-1k dataset. The goal is to understand the robustness of our Chamfer Guidance to the relevant hyperparameters, i.e., ω from Equation (2) (in Table 4) and the strength of the Chamfer Guidance γ from Equation (6) (in the Appendix). CFG ablation. Table 4 presents the impact of varying the CFG scale ω on our method. The results clearly demonstrate that a moderate ω value of 2.0 with k = 32 guiding images achieves optimal performance, yielding the highest F1 score (0.931) and competitive precision (0.950) while achieving high coverage (0.912). This balance is crucial for generating both accurate and diverse images.
Researcher Affiliation	Collaboration	Nicola Dall Asen1,2 Xiaofeng Zhang3,4,5 Reyhane Askari-Hemmat4 Melissa Hall4 Jakob Verbeek4 Adriana Romero-Soriano3,4,6,7 Michal Drozdzal4 1University of Trento 2University of Pisa 3Mila Québec AI Institute 4FAIR at Meta 5Université de Montréal 6Mc Gill University 7Canada CIFAR AI chair
Pseudocode	No	The paper describes methods using mathematical equations and descriptive text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We put an effort to make the paper self contained to facilitate reproducibility.
Open Datasets	Yes	We utilize three publicly available datasets. For the object-centric setting, we use Image Net1k [13], a large-scale image classification dataset... For the geodiversity representation, we use Geo DE [44] and Dollar Street [22]... For downstream utility, we employ Image Net-1k [13]. We further employ Image Net-V2 [46], Image Net-Sketch [62], Image Net-R [27], and Image Net-A [28] to measure out-of-distribution generalization.
Dataset Splits	Yes	We report the accuracy of a Vi T-B [16] classifier trained on this synthetic data and tested on real validation data. We follow the same evaluation protocol of c-VSG. For each k, we report the metrics corresponding to the best F1 score w.r.t. the validation set of each dataset. We report used hyperparameters and data splits in the Experiment section when novel. For the geographical diversity setup we refer to hyperparameters and data splits of previous works.
Hardware Specification	Yes	For all the experiments with k 8, we use a single H100 GPU to perform training and inference. We use multiple GPUs for k = [16, 32]. This translates to 4s. to generate a sample with LDM3.5M on a RTX A6000.
Software Dependencies	No	All experiments are implemented using the diffusers library [61], using the default samplers with 40 denoising steps. The Chamfer distance implementation is from Py Torch3D library [45].
Experiment Setup	Yes	All experiments are implemented using the diffusers library [61], using the default samplers with 40 denoising steps. For the latent projection of our Chamfer Guidance, we primarily use the DINOv2 [42] (Vi T-L) feature space... As in c-VSG [25], we set the inference-time guidance frequency to Gfreq = 5, i.e., we apply Chamfer Guidance once every five denoising steps. We use a constant learning rate of 10 6 across all experiments. For LDM1.5, we fine-tune the entire U-Net backbone, while for LDM3.5M we employ Lo RA [34] fine-tuning with a rank r = 4 applied on the key, query, value, and output layers of attention modules. For Re FL hyperparameters, we use the official implementation and set λ = 10 3, T = 40, T1 = 30, T2 = 39.