reproducibilityindex.ai

UltraPixel: Advancing Ultra High-Resolution Image Synthesis to New Peaks

Authors: Jingjing Ren, Wenbo Li, Haoyu Chen, Renjing Pei, Bin Shao, Yong Guo, Long Peng, Fenglong Song, Lei Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images and demonstrating state-of-the-art performance in extensive experiments.
Researcher Affiliation	Collaboration	Jingjing Ren1 , Wenbo Li2 , Haoyu Chen1, Renjing Pei2, Bin Shao2, Yong Guo3, Long Peng2, Fenglong Song2, Lei Zhu1,4 1HKUST (Guangzhou) 2Huawei Noah s Ark Lab 3MPI 4HKUST
Pseudocode	No	The paper includes diagrams and descriptions of its process but does not contain a formal pseudocode or algorithm block.
Open Source Code	Yes	Project page: https://jingjingrenabc.github.io/ultrapixel. The code repository link is provided in the home page https:// jingjingrenabc.github.io/ultrapixel/.
Open Datasets	Yes	We train models on 1M images of varying resolutions and aspect ratios, ranging from 1024 to 4608, sourced from LAION-Aesthetics [44], SAM [24], and self-collected high-quality dataset.
Dataset Splits	No	The paper mentions training on 1M images and evaluating on 1,000 images but does not explicitly detail training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	Yes	The training is conducted on 8 A100 GPUs with a batch size of 64.
Software Dependencies	No	The paper mentions using Adam W optimizer but does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	The training is conducted on 8 A100 GPUs with a batch size of 64. We employ the Adam W optimizer [30] with a learning rate of 0.0001. During training, we use continuous timesteps in [0, 1] as [36], while LR guidance is consistently corrupted with noise at timestep t = 0.05. During inference, the generative model uses 20 sampling steps, and the diffusion decoding model uses 10 steps. We adopt DDIM [45] with a classifier-free guidance [19] weight of 4 for latent generation and 1.1 for diffusion decoding.