reproducibilityindex.ai

Discrete Flows: Invertible Generative Models of Discrete Data

Authors: Dustin Tran, Keyon Vafa, Kumar Agrawal, Laurent Dinh, Ben Poole

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we find that discrete autoregressive flows outperform autoregressive baselines on synthetic discrete distributions, an addition task, and Potts models; and bipartite flows can obtain competitive performance with autoregressive baselines on character-level language modeling for Penn Tree Bank and text8.
Researcher Affiliation	Collaboration	1Google Brain 2Columbia University
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	No	No explicit statement or link to open-source code for the described methodology was found.
Open Datasets	Yes	We use Penn Tree Bank with minimal processing from Mikolov et al. (2012), consisting of roughly 5M characters and a vocabulary size of K = 51. We also evaluated on text8, using the preprocessing of Mikolov et al. (2012); Zhang et al. (2016) with 100M characters and a vocabulary size of K = 27.
Dataset Splits	Yes	We split the data into 90M characters for train, 5M characters for dev, and 5M characters for test.
Hardware Specification	Yes	Table 3 compares the test negative log-likelihood in bits per character as well as the time to generate a 288-dimensional sequence of tokens on a NVIDIA P100 GPU.
Software Dependencies	No	No specific software dependencies with version numbers were provided.
Experiment Setup	Yes	For the network for the autoregressive base distribution and location parameters of the flows, we used a Transformer with 64 hidden units. We used a composition of 1 flow for the autoregressive flow models, and 4 flows for the bipartite flow models. All LSTMs use 256 hidden units for D = 10, and 512 hidden units for D = 20. For discrete bipartite flows, we use a batch size of 128, sequence length of 256, a varying number of flows, and parameterize each flow with a Transformer with 2 or 3 layers, 512 hidden units, 2048 filter size, and 8 heads.