Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations

Authors: Kaibo Wang, Jianda Mao, Tong Wu, Yang Xiang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across diverse datasets and model architectures validate the superiority of FSG over state-of-the-art methods in both image quality and computational efficiency. Our work offers novel perspectives for conditional guidance and unlocks the potential of adaptive design.1
Researcher Affiliation	Academia	Kaibo Wang1, Jianda Mao1, Tong Wu1, Yang Xiang1,2 1Department of Mathematics, The Hong Kong University of Science and Technology 2 Shenzhen-Hong Kong Collaborative Innovation Research Institute, HKUST EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Foresight Guidance (FSG) Algorithm 2: CFG K Algorithm 3: CFG++ K
Open Source Code	Yes	1The codes are available at https://github.com/Ka1b0/Foresight-Guidance.
Open Datasets	Yes	Datasets. We assess generation performance across four benchmark datasets: Draw Bench [25], Pick-a-Pic [17], Geneval [10], and Parti Prompts [32]. Detailed experimental setups are provided in Appendix D.1, and results for Parti Prompts are included in Appendix D.2. Class conditional generation. To investigate whether enhanced fixed point iterations reduce diversity, we conduct experiments on the Image Net [6] 256 256 conditional generation task using Di T [21] models, generating 1K images per class (totaling 50K images).
Dataset Splits	No	The paper does not explicitly provide traditional training/test/validation dataset splits. It describes how subsets of benchmark datasets are used for evaluation (e.g., "We use the first 100 prompts from this dataset to test model performance" for Pick-a-Pic, or "We randomly pick 100 prompts" for Parti Prompts), which implies using pre-existing test/evaluation sets or specific evaluation methodologies rather than defining new splits for model training.
Hardware Specification	Yes	All experiments are performed on one NVIDIA A6000 GPU.
Software Dependencies	No	The paper mentions models and samplers (e.g., SDXL, DDIM, Di T, DDPM) but does not provide specific version numbers for any programming languages, libraries, or software packages used in the implementation.
Experiment Setup	Yes	Hyper-parameters. For Classifier-Free Guidance (CFG), we adopted a guidance strength w = 5.5, while CFG++ used λ = 0.6. Both methods utilized 50 inference steps. In Z-Sampling, forward guidance strength was set to 5.5, and reverse guidance strength to 0. Following the setting in [1], reflective sampling was applied during first 12/25 steps for NFE=50, 25/50 steps for NFE=100, and all 50 steps for NFE=150. For Resampling, configurations varied by NFE: at NFE=50, 25 inference steps with one resample per step; for NFE=100/150, 50 steps with 1 or 2 resamples per step, respectively. For our Foresight Guidance (FSG) method, we set λ = 1.0 for NFE=50/100 and λ = 0.7 for NFE=150. We allocate fixed-point iterations (ti, ti, Ki) using a stage-wise strategy that prioritizes early timesteps: