Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Authors: Qingping Zheng, Yuanfan Guo, Jiankang Deng, Jianhua Han, Ying Li, Songcen Xu, Hang Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the LAION-COCO and MM-Celeb A-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2 compared to the traditional tiled algorithm.
Researcher Affiliation Collaboration Qingping Zheng1, 2*, Yuanfan Guo2*, Jiankang Deng2, Jianhua Han2, Ying Li1 , Songcen Xu2, Hang Xu2 1Northwestern Polytechnical University 2Huawei Noah s Ark Lab
Pseudocode No The paper describes its two-stage pipeline and methods in detail using natural language and diagrams (e.g., Figure 2), but it does not include a clearly labeled pseudocode block or algorithm figure.
Open Source Code Yes The source code is available at https://github.com/Pro Air Verse/Any-Size-Diffusion.
Open Datasets Yes The ARAD of our ASD is trained on a subset of LAION-Aesthetic (Schuhmann 2022) with 90k textimage pairs in different aspect ratios. It is evaluated on MA-LAION-COCO with 21,000 images across 21 ratios (selecting from LAION-COCO (Schuhmann et al. 2022)), and MA-COCO built from MS-COCO (Lin et al. 2014) containing 2,100 images for those ratios. A test split of MM-Celeb A-HQ (Xia et al. 2021), consisting of 2,824 face image pairs in both low and high resolutions, is employed to evaluate our FSTD and whole pipeline.
Dataset Splits No The paper mentions training datasets (LAION-Aesthetic) and test splits (MM-Celeb A-HQ, MA-LAION-COCO, MA-COCO) but does not explicitly specify a 'validation' dataset split for hyperparameter tuning or early stopping, nor does it provide specific percentages or counts for a general train/validation/test split for all datasets.
Hardware Specification Yes Through empirical observation, we have found that attempts to generate 4K HD images using the SD model trigger out-of-memory errors when executed on a GPU with a 32GB capacity. All tests run on a 32G GPU.
Software Dependencies No Our proposed method is implemented in Py Torch (Paszke et al. 2019). While PyTorch is mentioned, a specific version number for PyTorch or any other key software libraries used in the implementation (e.g., CUDA, numpy, scikit-learn) is not provided.
Experiment Setup Yes A multi-aspect ratio training method is leveraged to finetune ARAD (using Lo RA (Hu et al. 2021)) for 10,000 steps with a batch size of 8. We use Adam (Kingma and Ba 2014) as an optimizer and the learning rate is set to 1.0e-4. Our FSTD (the second stage model) is training-free and is built upon Stable SR (Wang et al. 2023). During inference, DDIM sampler (Song, Meng, and Ermon 2020) of 50 steps is adopted in ARAD to generate the image according to the user-defined aspect ratio. In the second stage, we follow Stable SR to use 200 steps DDPM sampler (Ho, Jain, and Abbeel 2020) for FSTD.