Towards Text Generation with Adversarially Learned Neural Outlines
Authors: Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron C. Courville, Chris Pal
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our quantitative evaluations suggests that conditioning information from generated outlines is able to guide the autoregressive model to produce realistic samples, comparable to maximum-likelihood trained language models, even at high temperatures with multinomial sampling. Qualitative results also demonstrate that this generative procedure yields natural-looking sentences and interpolations. |
| Researcher Affiliation | Collaboration | Sandeep Subramanian1,2,4 , Sai Rajeswar1,2,5, Alessandro Sordoni4, Adam Trischler4, Aaron Courville1,2,6, Christopher Pal1,3,5 1Montr eal Institute for Learning Algorithms, 2Universit e de Montr eal, 3 Ecole Polytechnique de Montr eal, 4Microsoft Research Montr eal, 5Element AI, Montr eal, 6CIFAR Fellow |
| Pseudocode | No | The paper describes algorithms in text and equations (e.g., 'min G max D V (D, G) = E x PD [log D(E(x))] + E z P (z)[log(1 D(G(z)))]' and 'ht = ht 1 + α ht 1 log P(x2|ht 1)') but does not include structured pseudocode blocks. |
| Open Source Code | No | Official code from https://github.com/jakezhaojb/ARAE |
| Open Datasets | Yes | We consider the SNLI [4], Book Corpus [59] and WMT15 (English fraction of the En-De parallel corpus) datasets in unconditional text generation experiments. |
| Dataset Splits | No | In all settings, we partition the dataset into equally sized halves, one on which we train our generative model (GAN & Decoder) and the other for evaluation. |
| Hardware Specification | Yes | We thank NVIDIA for donating a DGX-1 computer used in this work and Fonds de recherche du Qu ebec Nature et technologies for funding. |
| Software Dependencies | No | We are also grateful to the Py Torch development team [38]. We trained all models with the Adam [27] stochastic optimization algorithm with a learning rate of 2e-4 and β1 = 0.5, β2 = 0.9. |
| Experiment Setup | Yes | In our generator and discriminator, we use 5-layer MLPs with 1024 hidden dimensions and leaky Re LU activation functions. We use the WGAN-GP formulation [19] in all experiments, with 5 discriminator updates for every generator update and a gradient penalty coefficient of 10. Our decoder architecture is identical to the multi-task decoders used in [51]. We trained all models with the Adam [27] stochastic optimization algorithm with a learning rate of 2e-4 and β1 = 0.5, β2 = 0.9. We used a noise radius of 0.12 for experiments involving SNLI and 0.2 for others. |