Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Authors: Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Reddy Akula, Pradyumna Narayana, S Basu, William Yang Wang, Xin Eric Wang

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By comparing Discffusion with stateof-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot imagetext matching. The paper contains a dedicated "5 Experiments" section detailing evaluations on multiple datasets, comparisons with baselines, and ablation studies, indicating empirical validation.
Researcher Affiliation	Collaboration	1UC Santa Cruz, 2UC Santa Barbara, 3Stability AI, 4Google EMAIL. The affiliations include both academic institutions (UC Santa Cruz, UC Santa Barbara) and industry companies (Stability AI, Google).
Pseudocode	Yes	The overall algorithm is shown in Algorithm 2. Algorithm 1 Discffusion Training Algorithm 2 Discffusion Inference
Open Source Code	No	The paper mentions using third-party libraries like "Accelerate library 2" and "Hugging Face Diffusers 4" but does not provide any explicit statement or link to the authors' own source code for Discffusion.
Open Datasets	Yes	We use the Compositional Visual Genome (Com VG) (Krishna et al., 2017) and Ref COCOg (Yu et al., 2016) datasets to do image-text matching. Additionally, we include the VQAv2 dataset (Antol et al., 2015). Winoground (Thrush et al., 2022) and VL-checklist (Zhao et al., 2022) are also included. LAION (Schuhmann et al., 2022) dataset for pre-training. MS-COCO dataset (Lin et al., 2014).
Dataset Splits	Yes	We then test Discffusion under the setting where we train the model with only 5% of the dataset (Yoo et al., 2021), demonstrating its adaptation capability using limited data. We have expanded our experimentation to extreme few-shot learning by conducting tests with only 0.5% of training data (27 examples from Com VG).
Hardware Specification	Yes	The inference was executed in a distributed manner on a NVIDIA workstation equipped with 4 A6000 GPUs. Remarkably, it requires only a single NVIDIA V100 GPU for training.
Software Dependencies	No	The paper mentions using "Stable Diffusion v2.1-base with the x Former (Lefaudeux et al., 2022) and flash attention (Dao et al., 2022) implementation" and the "Accelerate library" and "Hugging Face Diffusers" but does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	M is a predefined margin where we use 0.2 in our experiments. We use Stable Diffusion v2.1-base. On the Ref COCOg dataset, we sample 10 text prompts from the pool each time. The sampling was carried out using the DDIM (Song et al., 2020) method with a total of 50 steps. We set the noise level to {0.2, 0.4, 0.6, 0.8}.