Layout Generation as Intermediate Action Sequence Prediction

Authors: Huiting Yang, Danqing Huang, Chin-Yew Lin, Shengfeng He

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on three datasets with different types of design, including mobile UI, scientific documents, and slides. Both automatic and human evaluations show that our approach performs consistently better than the baselines.
Researcher Affiliation Collaboration 1 South China University of Technology 2 Microsoft Research Asia 3 Singapore Management University
Pseudocode No The paper describes the action schema and derivation rules in paragraph text and mathematical formulas, but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes Code is available at this website 1. 1https://github.com/microsoft/KC/tree/main/papers/Layout Action
Open Datasets Yes We evaluate on three datasets with different types of graphic designs: Rico (Deka et al. 2017; Liu et al. 2018). Pub Lay Net (Zhong, Tang, and Yepes 2019). Info PPT (Shi et al. 2022).
Dataset Splits Yes For all the datasets, we randomly split the dataset into 85% train, 5% validation, and 10% test.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions the use of AdamW optimizer and the Transformer backbone, and a Python library (python-pptx), but does not specify version numbers for these software dependencies.
Experiment Setup Yes We set the Transformer with 6 layers of hidden size 512. The number of attention heads is set to 8. We use Adam W optimizer (Loshchilov and Hutter 2017) with initial learning rate of 3e-4, β1 = 0.9, β1 = 0.95 and l2 weight decay of 5e4. We also apply early stopping, gradient clipping (Pascanu, Mikolov, and Bengio 2013), and warm up over the initial 1% training iterations. The dropout rate is set to 0.1. Models are trained with maximum of 50 epochs with batch size 64. During inference, we sample from the multinomial distribution and the top-k sampling (k = 5) for all models.