Markup-to-Image Diffusion Models with Scheduled Sampling

Authors: Yuntian Deng, Noriyuki Kojima, Alexander M Rush

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on four markup datasets: mathematical formulas (La Te X), table layouts (HTML), sheet music (Lily Pond), and molecular images (SMILES). These experiments each verify the effectiveness of the diffusion process and the use of scheduled sampling to fix generation issues.
Researcher Affiliation Academia 1 Harvard University dengyuntian@seas.harvard.edu 2 Cornell University {nk654,arush}@cornell.edu
Pseudocode Yes Algorithm 1 Scheduled Sampling and Algorithm 2 No Scheduled Sampling in Appendix C.
Open Source Code Yes All models, data, and code are publicly available at https://github.com/da03/markup2im.
Open Datasets Yes We adopt IM2LATEX-100K introduced in Deng et al. (2016)... from Deng et al. (2016). We generate 35k synthetic Lily Pond files... We use a solubility dataset by Wilkinson et al. (2022)... and All models, data, and code are publicly available at https://github.com/da03/markup2im.
Dataset Splits Yes Table 1: Markup-to-image datasets... # Train # Val # Test Math: 55,033 6,072 1,024, Table Layouts: 80,000 10,000 1,024, Sheet Music: 30,902 989 988, Molecules: 17,925 1,000 1,000.
Hardware Specification Yes We use a single Nvidia A100 GPU to train on the Math, Table Layouts, and Molecules datasets; We use four A100s to train on the Sheet Music dataset.
Software Dependencies No The paper mentions 'Hugging Face diffusers library' and 'Python package RDKIT' but does not specify their version numbers or other software dependencies with specific versions required for reproducibility.
Experiment Setup Yes We train all models for 100 epochs using the Adam W optimizer... The learning rate is set to 1e 4 with a cosine decay schedule over 100 epochs and 500 warmup steps. We use a batch size of 16 for all models. For scheduled sampling, we use m = 1. We linearly increase the rate of applying scheduled sampling from 0% to 50% from the beginning of the training to the end.