Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance
Authors: Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive qualitative and quantitative experiments illustrate the superior performance of Ctrl-X on various condition inputs and model checkpoints. |
| Researcher Affiliation | Collaboration | Kuan Heng Lin1* Sicheng Mo1* Ben Klingher1 Fangzhou Mu2 Bolei Zhou1 1University of California, Los Angeles 2NVIDIA |
| Pseudocode | No | The paper describes the method using prose and diagrams (Figure 3), but it does not include any formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | We publicly release our code and our data (for quantitative evaluation) at https://github.com/genforce/ctrl-x. |
| Open Datasets | Yes | We publicly release our dataset in our code release: https://github.com/genforce/ctrl-x. Our dataset consists of 177 1024 1024 images divided into 16 types and across 7 categories. |
| Dataset Splits | No | The paper mentions creating a new dataset for evaluation ('256 diverse structure-appearance pairs') and describes its composition. However, it does not specify explicit training, validation, or test dataset splits for model evaluation. It only mentions selecting '15 sample pairs' for a user study, which is not a general dataset split for the main quantitative evaluations. |
| Hardware Specification | Yes | We implement Ctrl-X with Diffusers [37] and run all experiments on a single NVIDIA A6000 GPU, except evaluating inference efficiency in Table 1 where we run on a single NVIDIA H100 GPU. |
| Software Dependencies | No | The paper states, 'We implement Ctrl-X with Diffusers [37]', citing the Diffusers library. However, it does not specify a particular version number for Diffusers or any other software dependency, which is required for reproducibility. |
| Experiment Setup | Yes | For SDXL, we set Lfeat = {0}decoder, Lself = {0, 1, 2}decoder, Lapp = {1, 2, 3, 4}decoder {2, 3, 4, 5}encoder, and τ s = τ a = 0.6. We sample Io with 50 steps of DDIM sampling and set η = 1 [33], doing self-recurrence for nr = 2 for τ r 0 = 0.1 and τ r 1 = 0.5. |