Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

Authors: Benyuan Meng, Qianqian Xu, Zitai Wang, Zhiyong Yang, Xiaochun Cao, Qingming Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experimental Validation
Researcher Affiliation Academia Benyuan Meng1,2 Qianqian Xu3,4 Zitai Wang3 Zhiyong Yang5 Xiaochun Cao6 Qingming Huang5,3,7 1Institute of Information Engineering, CAS 2School of Cyber Security, University of Chinese Academy of Sciences 3Key Lab. of Intelligent Information Processing, Institute of Computing Technology, CAS 4Peng Cheng Laboratory 5School of Computer Science and Tech., University of Chinese Academy of Sciences 6School of Cyber Science and Tech., Shenzhen Campus of Sun Yat-sen University 7Key Laboratory of Big Data Mining and Knowledge Management, CAS
Pseudocode No The paper describes procedures and guidelines but does not include formal pseudocode blocks or algorithm listings.
Open Source Code Yes Our code is available at this url.
Open Datasets Yes SPair-71k [26] dataset, label-scarce semantic segmentation using Bedroom28 [54] and Horse-21 [54] datasets, and standard semantic segmentation using ADE20K [61] and City Scapes [5] datasets. ... SPair-71k: Available at https://cvlab.postech.ac.kr/research/SPair-71k/. Horse-21 and Bedroom-28 (LSUN): Available at https://github.com/fyu/lsun. ADE20K: Custom (research-only, non-commercial), at https://groups.csail.mit.edu/ vision/datasets/ADE20K/terms/. City Scapes: Custom, at https://www.cityscapes-dataset.com/license/.
Dataset Splits No The paper mentions using training data and running experiments on 'different splits' for some datasets (Horse-21 and Bedroom-28), but does not provide specific percentages or counts for training, validation, and test splits for the main datasets used in the experiments.
Hardware Specification Yes We use Nvidia(R) RTX 3090 and Nvidia(R) RTX 4090 GPUs for the experiments, all with 24GB VRAM.
Software Dependencies No For the diffusion model to extract features, we choose Stable Diffusion v1.5 [34] to be consistent with SOTA competitors. ... Fortunately, the community has offered many handy tools for ordinary users, such as kohya_ss3, which can be directly utilized for our purpose.
Experiment Setup Yes All tasks extract features at t = 50. When Control Net is applied, except for standard semantic segmentation, we additionally start multi-step denoising from t = 60. ... γ1 and γ2, the hyper-parameters for the proposed regularization terms, are tuned for each dataset.