Rethinking Score Distillation as a Bridge Between Image Distributions

Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David Jacobs, Alexei Efros, Aleksander Holynski, Angjoo Kanazawa

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we test our proposed method on several generation problems where SDS is adopted. We compare against SDS and other task-specific baselines.
Researcher Affiliation Academia David Mc Allister1 Songwei Ge2 Jia-Bin Huang2 David W. Jacobs2 Alexei A. Efros1 Aleksander Holynski1 Angjoo Kanazawa1 1 UC Berkeley 2 University of Maryland
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No We will release the code with an opensource license when the paper is published.
Open Datasets Yes We use the MS-COCO [36] dataset for the evaluation. Consistent with the prior study [3], we randomly sample 5K captions from the COCO validation set as conditions for generating images.
Dataset Splits Yes We use the MS-COCO [36] dataset for the evaluation. Consistent with the prior study [3], we randomly sample 5K captions from the COCO validation set as conditions for generating images.
Hardware Specification No The paper mentions 'each run of our text-to-image generation with baseline VSD takes 1.3K GPU hours' in the NeurIPS checklist, indicating GPU usage, but does not provide specific GPU models or other hardware specifications for running experiments in the main text or experimental setup sections.
Software Dependencies No The paper mentions software components like 'stable-diffusion-v2-1-base model', 'Lo RA', and 'Three Studio [19] repository', but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup Yes For all the methods, we use the same learning rate of 0.01 and optimize for 2, 500 steps where we generally observe convergence. We compute the zero-shot FID [21] and CLIP FID scores [31] between these generated images and the ground truth images. We also report results generated by DDIM with 20 steps as a lower bound for renference.