kNN-Diffusion: Image Generation via Large-Scale Retrieval
Authors: Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As evaluated by human studies and automatic metrics, our method achieves stateof-the-art results compared to existing approaches that train text-to-image generation models using images-only dataset. |
| Researcher Affiliation | Industry | Shelly Sheynin , Oron Ashual , Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman Equal Contribution Meta AI {shellysheynin,oron}@meta.com |
| Pseudocode | Yes | In Algorithm 1 we include pseudocode of the core of the implementation of the retrieval database. |
| Open Source Code | No | The paper mentions using other authors' official code for baselines (e.g., Text2Live, Textual Inversion, LAFITE) but does not provide an explicit statement about releasing their own source code for KNN-Diffusion or a link to it. |
| Open Datasets | Yes | For photo-realistic experiments, our model was trained only on the images (omitting the text) of a modified version of the Public Multimodal Dataset (PMD) used by FLAVA (Singh et al., 2021). ... The modified PMD dataset is composed of the following set of publicly available text-image datasets: SBU Captions (Ordonez et al., 2011), Localized Narratives (Pont-Tuset et al., 2020), Conceptual Captions (Sharma et al., 2018), Visual Genome (Krishna et al., 2016), Wikipedia Image Text (Srinivasan et al., 2021), Conceptual Captions 12M (Changpinyo et al., 2021), Red Caps (Desai et al., 2021), and a filtered version of YFCC100M (Thomee et al., 2015). |
| Dataset Splits | Yes | We follow the evaluation protocol of LAFITE, reporting our results on 30,000 images from MS-COCO validation set without training, nor using it s training partition in the k NN index. |
| Hardware Specification | Yes | Table 3: Training details of our models - #GPUs 128 A100 (for Discrete model) and 200 A100 (for Continuous model). |
| Software Dependencies | No | The paper mentions several software components like FAISS, VQ-VAE, and VQGAN, along with citations to their respective papers or authors. However, it does not specify explicit version numbers for these software libraries or for core dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 3: Training details of our models - Number of nearest neighbors 10, Diffusion steps (100, 1000), Sampling steps (100, 250), Dropout 0.1, Weight decay 4.5e-2, Batch size (512, 1600), Iterations (150K, 500K), Learning rate (4.05-4, 1.4e-4), optimizer Adam W, Adam β2 (0.96, 0.9999), Adam ϵ 1.0e-8, EMA decay (0.99, 0.9999). |