reproducibilityindex.ai

Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions

Authors: Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, Max Welling

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood. In our experiments we compare the performance of our methods on language modelling tasks and learning image segmentation maps unconditionally.
Researcher Affiliation	Academia	Emiel Hoogeboom1 , Didrik Nielsen2 , Priyank Jaini1, Patrick Forré3, Max Welling1 UvA-Bosch Delta Lab, University of Amsterdam1, Technical University of Denmark2, University of Amsterdam3
Pseudocode	Yes	Algorithm 1 Sampling from Argmax Flows, Algorithm 2 Optimizing Argmax Flows, Algorithm 3 Thresholding-based q(v\|x), Algorithm 4 Gumbel-based q(v\|x)
Open Source Code	No	No statement or link regarding open-source code for their method.
Open Datasets	Yes	In this section we compare our methods on two language datasets, text8 and enwik8. For image-type data, we introduce a categorical image dataset: the cityscapes dataset is repurposed for unconditional image segmentation learning. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 2016; pp 3213 3223.
Dataset Splits	No	The Multinomial Diffusion model performs somewhat worse with 0.37 bpp on test whereas it scored 0.33 bpp on train. Interestingly, this the only model where overfitting was an issue and data augmentation was required, which may explain this portion of the performance difference. For all other models training performance was comparable to test and validation performance.
Hardware Specification	No	No specific hardware details are provided.
Software Dependencies	No	In the multinomial text diffusion model, the µ network is modeled by a 12-layer Transformer. The density model p(v) is deﬁned using afﬁne coupling layers parametrized by Dense Nets (Huang et al., 2017).
Experiment Setup	No	Model description Two versions of generative argmax ﬂows are tested: using an autoregressive (AR) ﬂow and a coupling-based ﬂow for p(v). In these experiments the probabilistic inverse is based on the thresholding approach. Speciﬁcally, a conditional diagonal Gaussian q(u\|x) is trained and thresholded which gives the distribution q(v\|x). The argmax ﬂow is deﬁned on binary Cartesian products. This means that for K = 27, a 5-dimensional binary space is used and for K = 256 an 8-dimensional binary space. The argmax ﬂow is compared to the current standard of training generative ﬂows directly on discrete data: dequantization. We compare to both uniform and variational dequantization, where noise on a (0, 1) interval is added to the onehot representation of the categorical data. The autoregressive density model is based on the model proposed in (Lippe and Gavves, 2020). The coupling density model consists of 8 ﬂow layers where each layer consists of a 1 1 convolution and mixture of logistics transformations Ho et al. (2019). In the multinomial text diffusion model, the µ network is modeled by a 12-layer Transformer. For more extensive details about the experiment setup see Appendix B.