Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance
Authors: Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive qualitative and quantitative experiments illustrate the superior performance of Ctrl-X on various condition inputs and model checkpoints. |
| Researcher Affiliation | Collaboration | Kuan Heng Lin1* Sicheng Mo1* Ben Klingher1 Fangzhou Mu2 Bolei Zhou1 1University of California, Los Angeles 2NVIDIA |
| Pseudocode | No | The paper describes the method using prose and diagrams (Figure 3), but it does not include any formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | We publicly release our code and our data (for quantitative evaluation) at https://github.com/genforce/ctrl-x. |
| Open Datasets | Yes | We publicly release our dataset in our code release: https://github.com/genforce/ctrl-x. Our dataset consists of 177 1024 1024 images divided into 16 types and across 7 categories. |
| Dataset Splits | No | The paper mentions creating a new dataset for evaluation ('256 diverse structure-appearance pairs') and describes its composition. However, it does not specify explicit training, validation, or test dataset splits for model evaluation. It only mentions selecting '15 sample pairs' for a user study, which is not a general dataset split for the main quantitative evaluations. |
| Hardware Specification | Yes | We implement Ctrl-X with Diffusers [37] and run all experiments on a single NVIDIA A6000 GPU, except evaluating inference efficiency in Table 1 where we run on a single NVIDIA H100 GPU. |
| Software Dependencies | No | The paper states, 'We implement Ctrl-X with Diffusers [37]', citing the Diffusers library. However, it does not specify a particular version number for Diffusers or any other software dependency, which is required for reproducibility. |
| Experiment Setup | Yes | For SDXL, we set Lfeat = {0}decoder, Lself = {0, 1, 2}decoder, Lapp = {1, 2, 3, 4}decoder {2, 3, 4, 5}encoder, and τ s = τ a = 0.6. We sample Io with 50 steps of DDIM sampling and set η = 1 [33], doing self-recurrence for nr = 2 for τ r 0 = 0.1 and τ r 1 = 0.5. |