ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
Authors: Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach can address the repetition issue well and achieve state-of-the-art performance on higher-resolution image synthesis, especially in texture details. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology 2Chinese Academy of Sciences 3Tencent AI Lab |
| Pseudocode | No | The paper describes the proposed methods (re-dilation, convolution dispersion, noise-damped classifier-free guidance) in detail within the text and using mathematical formulations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The abstract mentions a project website: 'More results are available at the project website: https://yingqinghe.github.io/scalecrafter/'. Upon visiting the website, it states 'Code is coming soon...'. Therefore, the code is not currently available. |
| Open Datasets | Yes | We evaluate performance on the dataset of Laion-5B (Schuhmann et al., 2022) which contains 5 billion image-caption pairs. |
| Dataset Splits | No | The paper is 'tuning-free' and evaluates on samples from a dataset. It states 'When the inference resolution is 1024x1024, we sample 30k images with randomly sampled text prompts from the dataset. Due to massive computation, we sample 10k images when the inference resolution is higher than 1024x1024.' This describes the evaluation process, but not a specific train/validation/test split for training or fine-tuning their method. |
| Hardware Specification | Yes | Time indicates the second used for synthesizing one image on one A100 GPU with 16-bit precision). |
| Software Dependencies | No | The paper mentions '16-bit precision' in relation to hardware, and refers to 'diffusers' for naming conventions of layers, but it does not specify any software libraries, frameworks, or their version numbers that would be necessary for reproduction (e.g., PyTorch version, CUDA version). |
| Experiment Setup | Yes | We list the hyperparameters for SD 1.5 only for brevity. The evaluation settings for SD 1.5 are shown in Tab. 6, 7, 8, 9. The settings for SD XL 1.0 are shown in Fig. 10, 11, 12, 13. (These tables list specific values for latent resolution, re-dilated blocks, dilation scale, dispersed blocks, inference timesteps, etc.) |