Long Text Generation via Adversarial Training with Leaked Information

Authors: Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, Jun Wang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments on synthetic data and various realworld tasks with Turing test demonstrate that Leak GAN is highly effective in long text generation and also improves the performance in short text generation scenarios.
Researcher Affiliation Academia Shanghai Jiao Tong University, University College London {jiaxian,steve lu,hcai,wnzhang,yyu}@apex.sjtu.edu.cn, j.wang@cs.ucl.ac.uk
Pseudocode No The paper describes the model and training procedures using mathematical equations and textual descriptions, but does not provide pseudocode or an algorithm block.
Open Source Code Yes The repeatable experiment code is published for further research3. 3https://github.com/CR-Gjx/Leak GAN.
Open Datasets Yes We choose the EMNLP2017 WMT4 Dataset as the long text corpus. Specifically, we pick the News section from the original dataset. Another real dataset we use is the COCO Image Captions Dataset (Chen et al. 2015), a dataset which contains groups of image-description pairs. To evaluate the performance of Leak GAN in short text generation, we pick the dataset of Chinese poems which is proposed by (Zhang and Lapata 2014).
Dataset Splits No For synthetic data experiments, we use it to generate 10,000 sequences of length 20 and 40 respectively as the training set S for the generative models. For EMNLP2017 WMT News, we randomly sample 200,000 sentences as the training set and another 10,000 sentences as the test set. For COCO, We randomly sample 80,000 sentences for the training set, and another 5,000 for the test set. The paper explicitly states train and test sets, but does not provide specific details for a *validation* split.
Hardware Specification No The paper mentions architectural choices for the models (CNN, LSTM) but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper refers to standard neural network architectures like LSTM and CNN, and algorithms like REINFORCE, but does not list specific software libraries or their version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes For the synthetic data experiment, the CNN kernel size ranges from 1 to T. The number of each kernel is between 100 and 200. Dropout (Srivastava et al. 2014) with the keep rate 0.75 and L2 regularization are performed to avoid overfitting. The MANAGER produces the 16-dimensional goal embedding feature vector wt using the feature map extracted by CNN. The goal duration time c is a hyperparameter set as 4 after some preliminary experiments. In our experiment, for example, the model adopts hyperparameter δ = 12.0 and the sigmoid function as σ( ). Here we select a higher temperature when we are training the model and a lower temperature when we adopt the model to generate samples.