In-Context Learning Unlocked for Diffusion Models
Authors: Zhendong Wang, Yifan Jiang, Yadong Lu, yelong shen, Pengcheng He, Weizhu Chen, Zhangyang "Atlas" Wang, Mingyuan Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to illustrate the power of Prompt Diffusion as a strong versatile vision-language foundation model that is capable of in-context learning. We first show that Prompt Diffusion performs well with multi-task training in Section 4.1. Then, we show that Prompt Diffusion has promising in-context learning ability and could generalize well to new, unseen tasks in Section 4.2. We show that Prompt Diffusion supports controllable and text-guided image editing in Section 4.3. Finally, we conduct an ablation study on the three main components of our vision-language prompt in Appendix A. |
| Researcher Affiliation | Collaboration | Zhendong Wang1,2, Yifan Jiang1, Yadong Lu2, Yelong Shen2, Pengcheng He2 Weizhu Chen2, Zhangyang Wang1, and Mingyuan Zhou1 1The University of Texas at Austin, 2Microsoft Azure AI |
| Pseudocode | No | The paper provides model architecture diagrams (Figure 12) and details of stacked convolution layers (Figure 13), but these are not presented as pseudocode or clearly labeled algorithm blocks with step-by-step procedures. |
| Open Source Code | Yes | We share our code and pre-trained models at https://github. com/Zhendong-Wang/Prompt-Diffusion. |
| Open Datasets | Yes | Datasets. We use the public dataset proposed by Brooks et al. [4] as our base dataset, which consists of around 310k image-caption pairs. |
| Dataset Splits | No | The paper mentions a 'test split of our base dataset [4]' but does not provide explicit training, validation, or test split percentages or sample counts for the reproduction of data partitioning. There is no specific mention of a validation split. |
| Hardware Specification | Yes | We train our model on 8 A100 Nvidia GPUs for 5000-20000 steps. |
| Software Dependencies | No | The paper mentions implementing Prompt Diffusion upon the codebase of Control Net [69], but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We fix the learning rate at 1 10 4 and accumulate gradients every 4 mini-batches with batch size 512. |