RelGAN: Relational Generative Adversarial Networks for Text Generation
Authors: Weili Nie, Nina Narodytska, Ankit Patel
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test Rel GAN on both synthetic and real data, where the synthetic data are 10,000 discrete sequences generated by an oracle-LSTM with fixed parameters (Yu et al., 2017) and the real data include the COCO image captions (Chen et al., 2015) and EMNLP2017 WMT News, first used by Guo et al. (2017) for text generation. The experimental settings are given in Appendix A. |
| Researcher Affiliation | Collaboration | Weili Nie Rice University wn8@rice.edu Nina Narodytska VMware Research nnarodytska@vmware.com Ankit B. Patel Rice University & Baylor College of Medicine abp4@rice.edu |
| Pseudocode | No | The paper does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code for a method or procedure. |
| Open Source Code | Yes | Code for reproducing the core results is available at https://github.com/weilinie/Rel GAN. |
| Open Datasets | Yes | We test Rel GAN on both synthetic and real data, where the synthetic data are 10,000 discrete sequences generated by an oracle-LSTM with fixed parameters (Yu et al., 2017) and the real data include the COCO image captions (Chen et al., 2015) and EMNLP2017 WMT News, first used by Guo et al. (2017) for text generation. |
| Dataset Splits | Yes | The dataset includes 4,682 unique words with the maximum sentence length 37. Both the training and test data contain 10,000 sentences. (...) The EMNLP2017 WMT News dataset consists of 5,255 unique words with the maximum sentence length 51 after applying the same data pre-processing as in Zhu et al. (2018). The training data contains abbout 270,000 sentences and test data contains 10,000 sentences. |
| Hardware Specification | No | The paper mentions running experiments and training models but does not specify any particular hardware components such as GPU/CPU models, memory, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer but does not specify its version number or any other software dependencies with their versions. |
| Experiment Setup | Yes | Unless stated otherwise, for the CNN-based discriminator architecture, we use filter windows of sizes {3,4,5} and 300 feature maps each. For relational memory, we set memory size to be 256, memory slots to be 1, number of heads to be 2. The batch size is set to be 64. For embedding dimensions, we set the embedding dimension of the input token for generator to be 32 and that for discriminator to be 1 with the number of embedded representations S = 64. We use Adam (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.999 and gradient clipping is applied if the norm of gradients exceeds 5. We first pre-train the generator via MLE with learning rate of 1e-2 for 150 epochs and then start adversarial training with learning rate of 1e-4 for both discriminator and generator. For adversarial training, we set the maximum number of iterations N = 5000 and we perform 5 gradient descent steps on the discriminator for every step on the generator. |