In-Context Learning Unlocked for Diffusion Models

Authors: Zhendong Wang, Yifan Jiang, Yadong Lu, yelong shen, Pengcheng He, Weizhu Chen, Zhangyang "Atlas" Wang, Mingyuan Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to illustrate the power of Prompt Diffusion as a strong versatile vision-language foundation model that is capable of in-context learning. We first show that Prompt Diffusion performs well with multi-task training in Section 4.1. Then, we show that Prompt Diffusion has promising in-context learning ability and could generalize well to new, unseen tasks in Section 4.2. We show that Prompt Diffusion supports controllable and text-guided image editing in Section 4.3. Finally, we conduct an ablation study on the three main components of our vision-language prompt in Appendix A.
Researcher Affiliation Collaboration Zhendong Wang1,2, Yifan Jiang1, Yadong Lu2, Yelong Shen2, Pengcheng He2 Weizhu Chen2, Zhangyang Wang1, and Mingyuan Zhou1 1The University of Texas at Austin, 2Microsoft Azure AI
Pseudocode No The paper provides model architecture diagrams (Figure 12) and details of stacked convolution layers (Figure 13), but these are not presented as pseudocode or clearly labeled algorithm blocks with step-by-step procedures.
Open Source Code Yes We share our code and pre-trained models at https://github. com/Zhendong-Wang/Prompt-Diffusion.
Open Datasets Yes Datasets. We use the public dataset proposed by Brooks et al. [4] as our base dataset, which consists of around 310k image-caption pairs.
Dataset Splits No The paper mentions a 'test split of our base dataset [4]' but does not provide explicit training, validation, or test split percentages or sample counts for the reproduction of data partitioning. There is no specific mention of a validation split.
Hardware Specification Yes We train our model on 8 A100 Nvidia GPUs for 5000-20000 steps.
Software Dependencies No The paper mentions implementing Prompt Diffusion upon the codebase of Control Net [69], but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We fix the learning rate at 1 10 4 and accumulate gradients every 4 mini-batches with batch size 512.