Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Authors: Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Mรผller, Joe Penna, Robin Rombach
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | User studies demonstrate that SDXL consistently surpasses all previous versions of Stable Diffusion by a significant margin (see Fig. 1).Table 2: Conditioning on the original spatial size of the training examples improves performance on class-conditional Image Net Deng et al. (2009) on 5122 resolution. |
| Researcher Affiliation | Academia | No explicit institutional affiliations (university names, company names, or email domains) are provided within the paper's text. |
| Pseudocode | Yes | Algorithm 1 Size and crop-micro-conditioning |
| Open Source Code | No | The paper states 'With SDXL we are releasing an open model' but does not provide a direct link to a source code repository for the described methodology or an explicit statement about its availability (e.g., 'Our code is available at...'). |
| Open Datasets | Yes | We quantitatively assess the effects of this simple but effective conditioning technique by training and evaluating three LDMs on class conditional Image Net (Deng et al., 2009) at spatial size 5122 |
| Dataset Splits | No | The paper mentions evaluating against 'the full validation set' for ImageNet metrics, but it does not provide specific details on the train/validation/test splits used for their models, particularly for the internal dataset or for the ImageNet models beyond the training set size. |
| Hardware Specification | No | The paper describes training procedures (e.g., 'batchsize of 2048') but does not specify any hardware details such as GPU models, CPU types, or other computing resources used for experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | First, we pretrain a base model (see Tab. 1) on an internal dataset whose height-and width-distribution is visualized in Fig. 2 for 600 000 optimization steps at a resolution of 256 256 pixels and a batchsize of 2048, using sizeand crop-conditioning as described in Sec. 2.2. We continue training on 512 px for another 200 000 optimization steps, and finally utilize multi-aspect training (Sec. 2.3) in combination with an offset-noise (Guttenberg & Cross Labs, 2023; Lin et al., 2023) level of 0.05 to train the model on different aspect ratios (Sec. 2.3, App. H) of 1024 1024 pixel area. |