LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

Authors: Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across a variety of conditional layout generation tasks on three datasets, i.e., Rico (Deka et al., 2017), Pub Lay Net (Zhong et al., 2019) and Magazine (Zheng et al., 2019), highlight the superiority of our method, in which Layout NUWA can significantly outperform all the baselines and shows comparable results with the task-specific models.
Researcher Affiliation Collaboration Zecheng Tang1,2 Chenfei Wu2 Juntao Li1 Nan Duan2 1Soochow University 2Microsoft Research Asia
Pseudocode No The paper describes the modules and processes of Layout NUWA but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Project NUWA/Layout NUWA.
Open Datasets Yes RICO (Deka et al., 2017) is a user interface design dataset for mobile applications containing 25 element categories and 66K+ UI layouts. Pub Lay Net (Zhong et al., 2019) consists of 360K+ layouts for documents with 5 element categories. Magazine (Zheng et al., 2019) is a low-resource magazine layout dataset containing around 4K annotated layouts and 6 element categories.
Dataset Splits Yes We follow Layout DM (Inoue et al., 2023) to view the original validation data as the testing set and pre-process all three datasets by discarding the layouts containing more than 25 elements as well as splitting the filtered data into the training and new validation sets by 95% and 5%.
Hardware Specification Yes For model training, we use Deep Speed Library (Rajbhandari et al., 2020) to run all experiments on 64 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions using 'Deep Speed Library', 'LLaMA-X', and 'Hugging Face' but does not specify their version numbers or the versions of other core software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We set permutation times K = 10 and task numbers T = 3. For model training, we use Deep Speed Library (Rajbhandari et al., 2020) to run all experiments on 64 NVIDIA V100 GPUs. We apply Top-p sampling (Holtzman et al., 2019) for inference, where p = 0.9 and the temperature is 0.6. For the DS settings, we set the learning rate to 5e-5. For the DA settings, we set the learning rate to 5e-6 to prevent model explosion.