Training Language GANs from Scratch

Authors: Cyprien de Masson d'Autume, Shakir Mohamed, Mihaela Rosca, Jack Rae

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show it is in fact possible to train a language GAN from scratch without maximum likelihood pre-training. We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs. The resulting model, Scratch GAN, performs comparably to maximum likelihood training on EMNLP2017 News and Wiki Text-103 corpora according to quality and diversity metrics. and 5 Experimental Results We use two datasets, EMNLP2017 News3 and Wikitext-103 [50].
Researcher Affiliation Industry Cyprien de Masson d Autume Mihaela Rosca Jack Rae Shakir Mohamed Deep Mind {cyprien,mihaelacr,jwrae,shakir}@google.com
Pseudocode No The paper includes architectural diagrams (Figure 1) but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes The Scratch GAN code can be found at https://github.com/deepmind/deepmind-research/ scratchgan.
Open Datasets Yes We use two datasets, EMNLP2017 News3 and Wikitext-103 [50]. For Wikitext-103 we use a vocabulary of 20k words. and 3http://www.statmt.org/wmt17/ and [50] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. ar Xiv preprint ar Xiv:1609.07843, 2016.
Dataset Splits No The paper mentions using 'validation data' and 'validation set' from standard datasets like Wikitext-103 and EMNLP2017 News, but it does not explicitly state the specific splitting methodology (e.g., percentages, sample counts, or how the splits were generated).
Hardware Specification Yes All our models are trained on individual sentences, using an NVIDIA P100 GPU.
Software Dependencies No The paper describes the model architecture and uses a Universal Sentence Encoder from TensorFlow Hub, but does not provide specific version numbers for software dependencies like Python, TensorFlow, or other libraries.
Experiment Setup Yes Model architectures, hyperparameters, regularization and experimental procedures for the results below are detailed in Appendix D.