UltraPixel: Advancing Ultra High-Resolution Image Synthesis to New Peaks
Authors: Jingjing Ren, Wenbo Li, Haoyu Chen, Renjing Pei, Bin Shao, Yong Guo, Long Peng, Fenglong Song, Lei Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images and demonstrating state-of-the-art performance in extensive experiments. |
| Researcher Affiliation | Collaboration | Jingjing Ren1 , Wenbo Li2 , Haoyu Chen1, Renjing Pei2, Bin Shao2, Yong Guo3, Long Peng2, Fenglong Song2, Lei Zhu1,4 1HKUST (Guangzhou) 2Huawei Noah s Ark Lab 3MPI 4HKUST |
| Pseudocode | No | The paper includes diagrams and descriptions of its process but does not contain a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Project page: https://jingjingrenabc.github.io/ultrapixel. The code repository link is provided in the home page https:// jingjingrenabc.github.io/ultrapixel/. |
| Open Datasets | Yes | We train models on 1M images of varying resolutions and aspect ratios, ranging from 1024 to 4608, sourced from LAION-Aesthetics [44], SAM [24], and self-collected high-quality dataset. |
| Dataset Splits | No | The paper mentions training on 1M images and evaluating on 1,000 images but does not explicitly detail training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | The training is conducted on 8 A100 GPUs with a batch size of 64. |
| Software Dependencies | No | The paper mentions using Adam W optimizer but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The training is conducted on 8 A100 GPUs with a batch size of 64. We employ the Adam W optimizer [30] with a learning rate of 0.0001. During training, we use continuous timesteps in [0, 1] as [36], while LR guidance is consistently corrupted with noise at timestep t = 0.05. During inference, the generative model uses 20 sampling steps, and the diffusion decoding model uses 10 steps. We adopt DDIM [45] with a classifier-free guidance [19] weight of 4 for latent generation and 1.1 for diffusion decoding. |