Discrete Flows: Invertible Generative Models of Discrete Data

Authors: Dustin Tran, Keyon Vafa, Kumar Agrawal, Laurent Dinh, Ben Poole

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that discrete autoregressive flows outperform autoregressive baselines on synthetic discrete distributions, an addition task, and Potts models; and bipartite flows can obtain competitive performance with autoregressive baselines on character-level language modeling for Penn Tree Bank and text8.
Researcher Affiliation Collaboration 1Google Brain 2Columbia University
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No No explicit statement or link to open-source code for the described methodology was found.
Open Datasets Yes We use Penn Tree Bank with minimal processing from Mikolov et al. (2012), consisting of roughly 5M characters and a vocabulary size of K = 51. We also evaluated on text8, using the preprocessing of Mikolov et al. (2012); Zhang et al. (2016) with 100M characters and a vocabulary size of K = 27.
Dataset Splits Yes We split the data into 90M characters for train, 5M characters for dev, and 5M characters for test.
Hardware Specification Yes Table 3 compares the test negative log-likelihood in bits per character as well as the time to generate a 288-dimensional sequence of tokens on a NVIDIA P100 GPU.
Software Dependencies No No specific software dependencies with version numbers were provided.
Experiment Setup Yes For the network for the autoregressive base distribution and location parameters of the flows, we used a Transformer with 64 hidden units. We used a composition of 1 flow for the autoregressive flow models, and 4 flows for the bipartite flow models. All LSTMs use 256 hidden units for D = 10, and 512 hidden units for D = 20. For discrete bipartite flows, we use a batch size of 128, sequence length of 256, a varying number of flows, and parameterize each flow with a Transformer with 2 or 3 layers, 512 hidden units, 2048 filter size, and 8 heads.