kNN-Diffusion: Image Generation via Large-Scale Retrieval

Authors: Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As evaluated by human studies and automatic metrics, our method achieves stateof-the-art results compared to existing approaches that train text-to-image generation models using images-only dataset.
Researcher Affiliation Industry Shelly Sheynin , Oron Ashual , Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman Equal Contribution Meta AI {shellysheynin,oron}@meta.com
Pseudocode Yes In Algorithm 1 we include pseudocode of the core of the implementation of the retrieval database.
Open Source Code No The paper mentions using other authors' official code for baselines (e.g., Text2Live, Textual Inversion, LAFITE) but does not provide an explicit statement about releasing their own source code for KNN-Diffusion or a link to it.
Open Datasets Yes For photo-realistic experiments, our model was trained only on the images (omitting the text) of a modified version of the Public Multimodal Dataset (PMD) used by FLAVA (Singh et al., 2021). ... The modified PMD dataset is composed of the following set of publicly available text-image datasets: SBU Captions (Ordonez et al., 2011), Localized Narratives (Pont-Tuset et al., 2020), Conceptual Captions (Sharma et al., 2018), Visual Genome (Krishna et al., 2016), Wikipedia Image Text (Srinivasan et al., 2021), Conceptual Captions 12M (Changpinyo et al., 2021), Red Caps (Desai et al., 2021), and a filtered version of YFCC100M (Thomee et al., 2015).
Dataset Splits Yes We follow the evaluation protocol of LAFITE, reporting our results on 30,000 images from MS-COCO validation set without training, nor using it s training partition in the k NN index.
Hardware Specification Yes Table 3: Training details of our models - #GPUs 128 A100 (for Discrete model) and 200 A100 (for Continuous model).
Software Dependencies No The paper mentions several software components like FAISS, VQ-VAE, and VQGAN, along with citations to their respective papers or authors. However, it does not specify explicit version numbers for these software libraries or for core dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Table 3: Training details of our models - Number of nearest neighbors 10, Diffusion steps (100, 1000), Sampling steps (100, 250), Dropout 0.1, Weight decay 4.5e-2, Batch size (512, 1600), Iterations (150K, 500K), Learning rate (4.05-4, 1.4e-4), optimizer Adam W, Adam β2 (0.96, 0.9999), Adam ϵ 1.0e-8, EMA decay (0.99, 0.9999).