Discrete Flows: Invertible Generative Models of Discrete Data
Authors: Dustin Tran, Keyon Vafa, Kumar Agrawal, Laurent Dinh, Ben Poole
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that discrete autoregressive flows outperform autoregressive baselines on synthetic discrete distributions, an addition task, and Potts models; and bipartite flows can obtain competitive performance with autoregressive baselines on character-level language modeling for Penn Tree Bank and text8. |
| Researcher Affiliation | Collaboration | 1Google Brain 2Columbia University |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | No explicit statement or link to open-source code for the described methodology was found. |
| Open Datasets | Yes | We use Penn Tree Bank with minimal processing from Mikolov et al. (2012), consisting of roughly 5M characters and a vocabulary size of K = 51. We also evaluated on text8, using the preprocessing of Mikolov et al. (2012); Zhang et al. (2016) with 100M characters and a vocabulary size of K = 27. |
| Dataset Splits | Yes | We split the data into 90M characters for train, 5M characters for dev, and 5M characters for test. |
| Hardware Specification | Yes | Table 3 compares the test negative log-likelihood in bits per character as well as the time to generate a 288-dimensional sequence of tokens on a NVIDIA P100 GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers were provided. |
| Experiment Setup | Yes | For the network for the autoregressive base distribution and location parameters of the flows, we used a Transformer with 64 hidden units. We used a composition of 1 flow for the autoregressive flow models, and 4 flows for the bipartite flow models. All LSTMs use 256 hidden units for D = 10, and 512 hidden units for D = 20. For discrete bipartite flows, we use a batch size of 128, sequence length of 256, a varying number of flows, and parameterize each flow with a Transformer with 2 or 3 layers, 512 hidden units, 2048 filter size, and 8 heads. |