Retrieval-Augmented Diffusion Models

Authors: Andreas Blattmann, Robin Rombach, Kaan Oktay, Jonas Müller, Björn Ommer

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As demonstrated by our experiments, simply swapping the database for one with different contents transfers a trained model post-hoc to a novel domain. The evaluation shows competitive performance on tasks which the generative model has not been trained on, such as class-conditional synthesis, zero-shot stylization or text-to-image synthesis without requiring paired text-image data.
Researcher Affiliation Academia Andreas Blattmann Robin Rombach Kaan Oktay Jonas Müller Björn Ommer LMU Munich, MCML & IWR, Heidelberg University, Germany
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/CompVis/retrieval-augmented-diffusion-models
Open Datasets Yes We train RDMs on the dogs-subset of Image Net [13] with i) Wiki Art [66] (RDM-WA), ii) MS-COCO [7] (RDM-COCO) and iii) 20M examples obtained by cropping images (see App. F.1) from Open Images [46] as train database Dtrain...
Dataset Splits Yes We evaluate their performance on the Image Net train- and validation-sets in Tab. 1, which shows RDM-OI to closely reach the performance of RDM-IN in CLIP-FID [48] and achieve more diverse results.
Hardware Specification No The paper states that the total amount of compute and type of resources used are detailed in the supplemental material, but no specific hardware models (e.g., GPU/CPU types) are mentioned in the main text.
Software Dependencies No The paper mentions software like 'Sca NN' and 'CLIP-Vi T-B/32' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For hyperparameters, implementation and evaluation details cf. Sec. F. For Image Net samples are generated with m = 0.01, guidance with s = 2.0 and 100 DDIM steps for RDM and m = 0.05, guidance scale s = 3.0 and top-k = 2048 for RARM. On FFHQ we use s = 1.0 , m = 0.1.