reproducibilityindex.ai

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

Authors: Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive qualitative and quantitative experiments illustrate the superior performance of Ctrl-X on various condition inputs and model checkpoints.
Researcher Affiliation	Collaboration	Kuan Heng Lin1* Sicheng Mo1* Ben Klingher1 Fangzhou Mu2 Bolei Zhou1 1University of California, Los Angeles 2NVIDIA
Pseudocode	No	The paper describes the method using prose and diagrams (Figure 3), but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code	Yes	We publicly release our code and our data (for quantitative evaluation) at https://github.com/genforce/ctrl-x.
Open Datasets	Yes	We publicly release our dataset in our code release: https://github.com/genforce/ctrl-x. Our dataset consists of 177 1024 1024 images divided into 16 types and across 7 categories.
Dataset Splits	No	The paper mentions creating a new dataset for evaluation ('256 diverse structure-appearance pairs') and describes its composition. However, it does not specify explicit training, validation, or test dataset splits for model evaluation. It only mentions selecting '15 sample pairs' for a user study, which is not a general dataset split for the main quantitative evaluations.
Hardware Specification	Yes	We implement Ctrl-X with Diffusers [37] and run all experiments on a single NVIDIA A6000 GPU, except evaluating inference efficiency in Table 1 where we run on a single NVIDIA H100 GPU.
Software Dependencies	No	The paper states, 'We implement Ctrl-X with Diffusers [37]', citing the Diffusers library. However, it does not specify a particular version number for Diffusers or any other software dependency, which is required for reproducibility.
Experiment Setup	Yes	For SDXL, we set Lfeat = {0}decoder, Lself = {0, 1, 2}decoder, Lapp = {1, 2, 3, 4}decoder {2, 3, 4, 5}encoder, and τ s = τ a = 0.6. We sample Io with 50 steps of DDIM sampling and set η = 1 [33], doing self-recurrence for nr = 2 for τ r 0 = 0.1 and τ r 1 = 0.5.