Rethinking Score Distillation as a Bridge Between Image Distributions
Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David Jacobs, Alexei Efros, Aleksander Holynski, Angjoo Kanazawa
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we test our proposed method on several generation problems where SDS is adopted. We compare against SDS and other task-specific baselines. |
| Researcher Affiliation | Academia | David Mc Allister1 Songwei Ge2 Jia-Bin Huang2 David W. Jacobs2 Alexei A. Efros1 Aleksander Holynski1 Angjoo Kanazawa1 1 UC Berkeley 2 University of Maryland |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | We will release the code with an opensource license when the paper is published. |
| Open Datasets | Yes | We use the MS-COCO [36] dataset for the evaluation. Consistent with the prior study [3], we randomly sample 5K captions from the COCO validation set as conditions for generating images. |
| Dataset Splits | Yes | We use the MS-COCO [36] dataset for the evaluation. Consistent with the prior study [3], we randomly sample 5K captions from the COCO validation set as conditions for generating images. |
| Hardware Specification | No | The paper mentions 'each run of our text-to-image generation with baseline VSD takes 1.3K GPU hours' in the NeurIPS checklist, indicating GPU usage, but does not provide specific GPU models or other hardware specifications for running experiments in the main text or experimental setup sections. |
| Software Dependencies | No | The paper mentions software components like 'stable-diffusion-v2-1-base model', 'Lo RA', and 'Three Studio [19] repository', but does not provide specific version numbers for these or other ancillary software dependencies. |
| Experiment Setup | Yes | For all the methods, we use the same learning rate of 0.01 and optimize for 2, 500 steps where we generally observe convergence. We compute the zero-shot FID [21] and CLIP FID scores [31] between these generated images and the ground truth images. We also report results generated by DDIM with 20 steps as a lower bound for renference. |