Towards Text Generation with Adversarially Learned Neural Outlines

Authors: Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron C. Courville, Chris Pal

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our quantitative evaluations suggests that conditioning information from generated outlines is able to guide the autoregressive model to produce realistic samples, comparable to maximum-likelihood trained language models, even at high temperatures with multinomial sampling. Qualitative results also demonstrate that this generative procedure yields natural-looking sentences and interpolations.
Researcher Affiliation Collaboration Sandeep Subramanian1,2,4 , Sai Rajeswar1,2,5, Alessandro Sordoni4, Adam Trischler4, Aaron Courville1,2,6, Christopher Pal1,3,5 1Montr eal Institute for Learning Algorithms, 2Universit e de Montr eal, 3 Ecole Polytechnique de Montr eal, 4Microsoft Research Montr eal, 5Element AI, Montr eal, 6CIFAR Fellow
Pseudocode No The paper describes algorithms in text and equations (e.g., 'min G max D V (D, G) = E x PD [log D(E(x))] + E z P (z)[log(1 D(G(z)))]' and 'ht = ht 1 + α ht 1 log P(x2|ht 1)') but does not include structured pseudocode blocks.
Open Source Code No Official code from https://github.com/jakezhaojb/ARAE
Open Datasets Yes We consider the SNLI [4], Book Corpus [59] and WMT15 (English fraction of the En-De parallel corpus) datasets in unconditional text generation experiments.
Dataset Splits No In all settings, we partition the dataset into equally sized halves, one on which we train our generative model (GAN & Decoder) and the other for evaluation.
Hardware Specification Yes We thank NVIDIA for donating a DGX-1 computer used in this work and Fonds de recherche du Qu ebec Nature et technologies for funding.
Software Dependencies No We are also grateful to the Py Torch development team [38]. We trained all models with the Adam [27] stochastic optimization algorithm with a learning rate of 2e-4 and β1 = 0.5, β2 = 0.9.
Experiment Setup Yes In our generator and discriminator, we use 5-layer MLPs with 1024 hidden dimensions and leaky Re LU activation functions. We use the WGAN-GP formulation [19] in all experiments, with 5 discriminator updates for every generator update and a gradient penalty coefficient of 10. Our decoder architecture is identical to the multi-task decoders used in [51]. We trained all models with the Adam [27] stochastic optimization algorithm with a learning rate of 2e-4 and β1 = 0.5, β2 = 0.9. We used a noise radius of 0.12 for experiments involving SNLI and 0.2 for others.