reproducibilityindex.ai

Inner Classifier-Free Guidance and Its Taylor Expansion for Diffusion Models

Authors: Shikun Sun, Longhui Wei, Zhicai Wang, Zixuan Wang, Junliang Xing, Jia Jia, Qi Tian

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The evaluation of the results, presented in Table 1, Table 2 and Table 3 is based on two metrics: the Fr echet Inception Distance (FID) (Heusel et al., 2017) and the CLIP Score (Radford et al., 2021). The FID metric is calculated by comparing 10,000 generated images with the MS-COCO (Lin et al., 2014) validation dataset, measuring the distance between the distribution of generated images and the distribution of the validation dataset. On the other hand, the CLIP Score is computed between the 10,000 generated images and their corresponding captions by the model Vi T-L/14 (Radford et al., 2021), reflecting the similarity between the images and the textual descriptions.
Researcher Affiliation	Collaboration	1Tsinghua University, 2BNRist, 3Huawei Inc., 4University of Science and Technology of China
Pseudocode	Yes	Algorithm 1 Training policy for ICFG; Algorithm 2 Strict sample algorithm for second-order ICFG; Algorithm 3 Non-strict sample algorithm for second-order ICFG
Open Source Code	No	The paper does not provide a direct link or explicit statement about the public release of its source code.
Open Datasets	Yes	The FID metric is calculated by comparing 10,000 generated images with the MS-COCO (Lin et al., 2014) validation dataset, measuring the distance between the distribution of generated images and the distribution of the validation dataset.
Dataset Splits	Yes	The FID metric is calculated by comparing 10,000 generated images with the MS-COCO (Lin et al., 2014) validation dataset, measuring the distance between the distribution of generated images and the distribution of the validation dataset. On the other hand, the CLIP Score is computed between the 10,000 generated images and their corresponding captions by the model Vi T-L/14 (Radford et al., 2021), reflecting the similarity between the images and the textual descriptions.
Hardware Specification	Yes	We conducted our experiments on an NVIDIA Ge Force RTX 3090, using a batch size of 4.
Software Dependencies	No	The paper mentions software components such as PNDM, Adam optimizer, U-Net, CLIP model, Stable Diffusion v1.5, and Low-Rank Adaptation, but does not provide specific version numbers for these or the programming language/frameworks used.
Experiment Setup	Yes	The sampling algorithm employed is PNDM (Liu et al., 2022), and the default number of timesteps is 50.