Retrieval-Augmented Diffusion Models
Authors: Andreas Blattmann, Robin Rombach, Kaan Oktay, Jonas Müller, Björn Ommer
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As demonstrated by our experiments, simply swapping the database for one with different contents transfers a trained model post-hoc to a novel domain. The evaluation shows competitive performance on tasks which the generative model has not been trained on, such as class-conditional synthesis, zero-shot stylization or text-to-image synthesis without requiring paired text-image data. |
| Researcher Affiliation | Academia | Andreas Blattmann Robin Rombach Kaan Oktay Jonas Müller Björn Ommer LMU Munich, MCML & IWR, Heidelberg University, Germany |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/CompVis/retrieval-augmented-diffusion-models |
| Open Datasets | Yes | We train RDMs on the dogs-subset of Image Net [13] with i) Wiki Art [66] (RDM-WA), ii) MS-COCO [7] (RDM-COCO) and iii) 20M examples obtained by cropping images (see App. F.1) from Open Images [46] as train database Dtrain... |
| Dataset Splits | Yes | We evaluate their performance on the Image Net train- and validation-sets in Tab. 1, which shows RDM-OI to closely reach the performance of RDM-IN in CLIP-FID [48] and achieve more diverse results. |
| Hardware Specification | No | The paper states that the total amount of compute and type of resources used are detailed in the supplemental material, but no specific hardware models (e.g., GPU/CPU types) are mentioned in the main text. |
| Software Dependencies | No | The paper mentions software like 'Sca NN' and 'CLIP-Vi T-B/32' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For hyperparameters, implementation and evaluation details cf. Sec. F. For Image Net samples are generated with m = 0.01, guidance with s = 2.0 and 100 DDIM steps for RDM and m = 0.05, guidance scale s = 3.0 and top-k = 2048 for RARM. On FFHQ we use s = 1.0 , m = 0.1. |