Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling

Authors: Tianqi Chen, Mingyuan Zhou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the power and versatility of the proposed learning-to-jump framework for generative modeling, we evaluate JUMP models on a diverse set of non-negative data, ranging from univariate non-negative data of various types, document term-frequency count vectors and TF-IDF vectors obtained from two representative text corpora, to natural images whose pixel values lie between 0 and 255.
Researcher Affiliation Academia 1Mc Combs School of Business, The University of Texas at Austin. Correspondence to: Tianqi Chen <tqch@utexas.edu>, Mingyuan Zhou <mingyuan.zhou@mccombs.utexas.edu>.
Pseudocode Yes Algorithm 1 Training (Page 4) and Algorithm 2 Sampling (Page 5)
Open Source Code Yes Our code is available at https://github.com/tqch/poisson-jump.
Open Datasets Yes The 20 Newsgroups dataset comprises 18,846 news posts on 20 topics. (Section 4.2, page 6); The NeurIPS dataset is a collection of 7241 papers published in NeurIPS from 1987 to 2016. (Section 4.2, page 6); 2http://qwone.com/ jason/20Newsgroups/; 3https://www.kaggle.com/datasets/ benhamner/nips-papers; Table 3. Comparison of different models on CIFAR-10.
Dataset Splits No We draw 100,000 random samples from each distribution to form the training data.
Hardware Specification No The authors acknowledge the support of NSF-IIS 2212418, the Fall 2022 Mc Combs REG award, the NSF AI Institute for Foundations of Machine Learning (IFML), and Texas Advanced Computing Center (TACC).
Software Dependencies No For all the univariate and document-type datasets, our models are trained for 600 epochs using the Adam optimizer (Kingma & Ba, 2015). We use a fixed learning rate of 0.001 and the default values for the parameters β1 = 0.9 and β2 = 0.999. In the case of CIFAR-10 image generation, we utilize the AdamW optimizer (Loshchilov & Hutter, 2019) with a learning rate of 0.0002 and a weight decay of 0.001.
Experiment Setup Yes We set β1 to 0.001 by default. The value of βT is selected such that the log-SNR (Signal-to-Noise Ratio) will be approximately 12 on average at the end of the forward chain, ensuring that the loss of the last time step LT is approximately 0. (Appendix A.1, page 9); For all the univariate and document-type datasets, our models are trained for 600 epochs using the Adam optimizer (Kingma & Ba, 2015). We use a fixed learning rate of 0.001 and the default values for the parameters β1 = 0.9 and β2 = 0.999. (Appendix A.3, page 9); For CIFAR-10 image generation, we use a UNet model architecture similar to the one used by Nichol & Dhariwal (2021). Our UNet model has three stages through downsampling and upsampling, which correspond to spatial dimensions of 32 32, 16 16, and 8 8. Each stage of the model consists of 3 residual blocks with 128 hidden channels followed by a self-attention layer except for the first stage. In addition, we use a dropout rate of 0.2 for extra regularization. (Appendix B, page 10)