RelGAN: Relational Generative Adversarial Networks for Text Generation

Authors: Weili Nie, Nina Narodytska, Ankit Patel

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test Rel GAN on both synthetic and real data, where the synthetic data are 10,000 discrete sequences generated by an oracle-LSTM with fixed parameters (Yu et al., 2017) and the real data include the COCO image captions (Chen et al., 2015) and EMNLP2017 WMT News, first used by Guo et al. (2017) for text generation. The experimental settings are given in Appendix A.
Researcher Affiliation Collaboration Weili Nie Rice University wn8@rice.edu Nina Narodytska VMware Research nnarodytska@vmware.com Ankit B. Patel Rice University & Baylor College of Medicine abp4@rice.edu
Pseudocode No The paper does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code for a method or procedure.
Open Source Code Yes Code for reproducing the core results is available at https://github.com/weilinie/Rel GAN.
Open Datasets Yes We test Rel GAN on both synthetic and real data, where the synthetic data are 10,000 discrete sequences generated by an oracle-LSTM with fixed parameters (Yu et al., 2017) and the real data include the COCO image captions (Chen et al., 2015) and EMNLP2017 WMT News, first used by Guo et al. (2017) for text generation.
Dataset Splits Yes The dataset includes 4,682 unique words with the maximum sentence length 37. Both the training and test data contain 10,000 sentences. (...) The EMNLP2017 WMT News dataset consists of 5,255 unique words with the maximum sentence length 51 after applying the same data pre-processing as in Zhu et al. (2018). The training data contains abbout 270,000 sentences and test data contains 10,000 sentences.
Hardware Specification No The paper mentions running experiments and training models but does not specify any particular hardware components such as GPU/CPU models, memory, or cloud computing instance types.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes Unless stated otherwise, for the CNN-based discriminator architecture, we use filter windows of sizes {3,4,5} and 300 feature maps each. For relational memory, we set memory size to be 256, memory slots to be 1, number of heads to be 2. The batch size is set to be 64. For embedding dimensions, we set the embedding dimension of the input token for generator to be 32 and that for discriminator to be 1 with the number of embedded representations S = 64. We use Adam (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.999 and gradient clipping is applied if the norm of gradients exceeds 5. We first pre-train the generator via MLE with learning rate of 1e-2 for 150 epochs and then start adversarial training with learning rate of 1e-4 for both discriminator and generator. For adversarial training, we set the maximum number of iterations N = 5000 and we perform 5 gradient descent steps on the discriminator for every step on the generator.