reproducibilityindex.ai

PLay: Parametrically Conditioned Layout Generation using Latent Diffusion

Authors: Chin-Yi Cheng, Forrest Huang, Gang Li, Yang Li

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method outperforms prior works across three datasets on metrics including FID and FD-VG, and in user study.
Researcher Affiliation	Industry	1Google Research, Mountain View, United States. Correspondence to: Chin-Yi Cheng <cchinyi@google.com>, Yang Li <liyang@google.com>.
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided. The model components are described visually in Figure 14 and processes are explained in text.
Open Source Code	No	The paper does not contain an explicit statement that the authors are releasing their code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We experiment PLay with three publicly available datasets for two different domains: UI and document layouts. CLAY (Li et al., 2022) contains about 50K UI layouts with 24 classes. RICO-Semantic (Liu et al., 2018) contains about 43K UI layouts with 13 classes previously used in VTN. Publay Net (Zhong et al., 2019) contains about 330K document layouts with 5 classes.
Dataset Splits	No	The paper mentions using CLAY, RICO-Semantic, and Publay Net datasets but does not explicitly provide specific percentages or counts for training, validation, or test splits. It only mentions 'sample size s = 1024' for metric computation, which is not a dataset split.
Hardware Specification	Yes	The model is trained using 8 Google Cloud TPU v4 cores for 47 hours.
Software Dependencies	No	We implemented the proposed architecture in JAX and Flax. We use ADAM optimizer (b1 = 0.9, b2 = 0.98) with 500k steps and a batch size of 128. The learning rate is 0.001 with linear warming up to 8k steps.
Experiment Setup	Yes	We use ADAM optimizer (b1 = 0.9, b2 = 0.98) with 500k steps and a batch size of 128. The learning rate is 0.001 with linear warming up to 8k steps. For the denoise network ϵθ(zt, τψ(G), t), we use a Transformer encoder to replace the U-Net structure used in image-based DMs and predict the noise ϵ. We also added a small KL-penalty to regularize the latent space while keeping the high reconstruction accuracy. In sampling, DDPM and CFG, with w = 1.5. we also found that discrete coordinate values work better empirically and set the dimension of each layout with width = 36 and height = 64. We ﬁx the maximum number of elements per layout: N = 128, and the layout with fewer elements are padded to the same size, which result in ﬁxed N and D for all layouts. We ﬁx the maximum number of guidelines for each layout: M = 128