Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

Authors: Jianan Yang, Chenchao Gao, Zhiqing Xiao, Junbo Zhao, Sai Wu, Gang Chen, Haobo Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The extensive results show that CATOD significantly outperforms the prior approaches with an 11.10 boost on the CLIP score and a 33.08% decrease on the CMMD metric.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University 2Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3School of Software Technology, Zhejiang University 4International School of Information Science and Engineering, Dalian University of Technology
Pseudocode Yes An overall Algorithm for a cycle of CATOD is provided in Alg. 1. Algorithm 1 An active selection cycle for CATOD.
Open Source Code Yes The source code is attached in the Supplementary.
Open Datasets Yes This dataset consists of 5 categories: insect, lizard, penguin, seafish, and snake, and each category contains data from 5 OOD concepts. Each concept has 1,000 examples in total with 100 samples left out for validation. The dataset is collected from publicly available datasets including Image Net, i Naturalist 2018 [67], IP102 [71].
Dataset Splits Yes Each concept has 1,000 examples in total with 100 samples left out for validation.
Hardware Specification No The paper mentions using Stable Diffusion 2.0 pre-trained on LAION-5B, but does not specify the hardware (e.g., GPU models, CPU models, memory) used to run their experiments.
Software Dependencies No The paper mentions using Stable Diffusion 2.0 but does not provide version numbers for any software dependencies, libraries, or programming languages.
Experiment Setup Yes Each experiment starts with 20 randomly sampled instances, and we conducted 5 cycles of data accumulation in which we selected 20 good samples to add to the training pool. We train 20 epochs for all combinations of adaption techniques and sampling strategies in each active learning cycle, with a batch size of 1. Furthermore, we generate 100 images for each concept for evaluation. We use the commonly adopted Stable Diffusion 2.0 pre-trained on LAION-5B [51] following Rombach s work [45].