Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
In-Context Learning Unlocked for Diffusion Models
Authors: Zhendong Wang, Yifan Jiang, Yadong Lu, yelong shen, Pengcheng He, Weizhu Chen, Zhangyang "Atlas" Wang, Mingyuan Zhou
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to illustrate the power of Prompt Diffusion as a strong versatile vision-language foundation model that is capable of in-context learning. We first show that Prompt Diffusion performs well with multi-task training in Section 4.1. Then, we show that Prompt Diffusion has promising in-context learning ability and could generalize well to new, unseen tasks in Section 4.2. We show that Prompt Diffusion supports controllable and text-guided image editing in Section 4.3. Finally, we conduct an ablation study on the three main components of our vision-language prompt in Appendix A. |
| Researcher Affiliation | Collaboration | Zhendong Wang1,2, Yifan Jiang1, Yadong Lu2, Yelong Shen2, Pengcheng He2 Weizhu Chen2, Zhangyang Wang1, and Mingyuan Zhou1 1The University of Texas at Austin, 2Microsoft Azure AI |
| Pseudocode | No | The paper provides model architecture diagrams (Figure 12) and details of stacked convolution layers (Figure 13), but these are not presented as pseudocode or clearly labeled algorithm blocks with step-by-step procedures. |
| Open Source Code | Yes | We share our code and pre-trained models at https://github. com/Zhendong-Wang/Prompt-Diffusion. |
| Open Datasets | Yes | Datasets. We use the public dataset proposed by Brooks et al. [4] as our base dataset, which consists of around 310k image-caption pairs. |
| Dataset Splits | No | The paper mentions a 'test split of our base dataset [4]' but does not provide explicit training, validation, or test split percentages or sample counts for the reproduction of data partitioning. There is no specific mention of a validation split. |
| Hardware Specification | Yes | We train our model on 8 A100 Nvidia GPUs for 5000-20000 steps. |
| Software Dependencies | No | The paper mentions implementing Prompt Diffusion upon the codebase of Control Net [69], but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We fix the learning rate at 1 10 4 and accumulate gradients every 4 mini-batches with batch size 512. |