ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

Authors: Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce CONCEPTBED, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts.
Researcher Affiliation Academia 1 Arizona State University 2 University of Maryland Baltimore County
Pseudocode Yes Algorithm 1: Concept Confidence Deviation
Open Source Code Yes The data, code, and interactive demo is available at: https://conceptbed.github.io/
Open Datasets Yes CONCEPTBED incorporates existing datasets such as Image Net (Deng et al. 2009), PACS (Li et al. 2017), CUB (Wah et al. 2011), and Visual Genome (Krishna et al. 2017), enabling the creation of a labeled dataset.
Dataset Splits No The paper states it trains oracles on "CONCEPTBED training dataset, Dtrain CONCEPTBED" and calculates CCD using "test ground truth images", but it does not specify explicit percentages or sample counts for train/validation/test splits, nor does it detail a split methodology for reproducibility.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running its experiments.
Software Dependencies No The paper mentions models and frameworks used (e.g., ResNet18, ConvNeXt, ViLT, CLIP) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No The paper describes the scale of experiments (e.g., N=100 images, N=3 images for composite prompts, 1100+ models, 500,000 images generated) but does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, number of epochs, optimizer settings) or training configurations for the concept learning models or oracles.