Adaptively Multi-Objective Adversarial Training for Dialogue Generation

Authors: Xuemiao Zhang, Zhouxing Tan, Xiaoning Zhang, Yang Cao, Rui Yan

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two real-world datasets show a significant improvement over the baselines.
Researcher Affiliation Collaboration Xuemiao Zhang1 , Zhouxing Tan1 , Xiaoning Zhang2 , Yang Cao4 and Rui Yan3 1School of Software & Microelectronics, Peking University 2Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences 3Wangxuan Institute of Computer Technology, Peking University 4Sense Time Research
Pseudocode Yes Algorithm 1 Training AMPGAN.
Open Source Code No The paper does not provide any explicit statement or link regarding the availability of its source code.
Open Datasets Yes Cornell Movie Dataset (denoted as S1) contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts. It consists of 220579 exchanges between 10292 pairs of characters in the movie, involving 9035 characters in 617 movies, a total of 304713 utterances. Open Subtitles Dataset (S2) is a well-known human-human scripted dialogue dataset. It is extracted from movie subtitles which are not speaker-aligned [Tiedemann, 2009].
Dataset Splits No The paper mentions using a 'validation set' for early stopping ('if the performance on the validation set has not improved for a long time, stop training and choose the checkpoint with the best performance'), but does not provide specific split percentages or sample counts for the training, validation, and test sets.
Hardware Specification No The paper does not provide any specific hardware details such as CPU or GPU models used for running the experiments.
Software Dependencies No The paper mentions using specific tools like 'Adam optimizer' and 'Stanford Core NLP parser' but does not provide version numbers for these or any other software dependencies.
Experiment Setup Yes We set the training batch size to 128, the size of word embeddings and the graph node embeddings to 512, the hidden size of hidden layers of all encoders in all models to 256, and the number of all LSTM layers to 2, and GCN layer number of Dsyn to 1. We train all models using Adam optimizer [Kingma and Ba, 2015], and all the dropout rates to 0.7. For the generator Gθ, we use the learning rate decay strategy and set the initial learning rate to 0.1 and the decay factor to 0.99. We set the learning rate to 0.001 fixedly for Drf and Dsyn. In pre-training, we first pre-train Gθ 5000 iterations, then use Gθ to produce 2500 128 negative examples and sample the same amount of positive ones from the dataset correspondingly. We combine both to pre-train Drf and Dsyn. In MC search, we employ the experience [Li et al., 2017] that given a partially decoded s P , Gθ will keep sampling tokens in word distribution until decoding is complete. Repeat this process k times (k is set to 7) and obtain k sequences sharing a common prefix s P . The average of the corresponding k scores given by the discriminator is used as the reward.