Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Discrete Flows: Invertible Generative Models of Discrete Data
Authors: Dustin Tran, Keyon Vafa, Kumar Agrawal, Laurent Dinh, Ben Poole
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that discrete autoregressive flows outperform autoregressive baselines on synthetic discrete distributions, an addition task, and Potts models; and bipartite flows can obtain competitive performance with autoregressive baselines on character-level language modeling for Penn Tree Bank and text8. |
| Researcher Affiliation | Collaboration | 1Google Brain 2Columbia University |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | No explicit statement or link to open-source code for the described methodology was found. |
| Open Datasets | Yes | We use Penn Tree Bank with minimal processing from Mikolov et al. (2012), consisting of roughly 5M characters and a vocabulary size of K = 51. We also evaluated on text8, using the preprocessing of Mikolov et al. (2012); Zhang et al. (2016) with 100M characters and a vocabulary size of K = 27. |
| Dataset Splits | Yes | We split the data into 90M characters for train, 5M characters for dev, and 5M characters for test. |
| Hardware Specification | Yes | Table 3 compares the test negative log-likelihood in bits per character as well as the time to generate a 288-dimensional sequence of tokens on a NVIDIA P100 GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers were provided. |
| Experiment Setup | Yes | For the network for the autoregressive base distribution and location parameters of the flows, we used a Transformer with 64 hidden units. We used a composition of 1 flow for the autoregressive flow models, and 4 flows for the bipartite flow models. All LSTMs use 256 hidden units for D = 10, and 512 hidden units for D = 20. For discrete bipartite flows, we use a batch size of 128, sequence length of 256, a varying number of flows, and parameterize each flow with a Transformer with 2 or 3 layers, 512 hidden units, 2048 filter size, and 8 heads. |