Learning to Copy Coherent Knowledge for Response Generation

Authors: Jiaqi Bai, Ze Yang, Xinnian Liang, Wei Wang, Zhoujun Li12535-12543

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical studies are conducted on two benchmarks of goal-oriented knowledge-driven dialog generation. The results show that our model can significantly outperform several state-of-the-art models in terms of both automatic evaluation and human judgments.
Researcher Affiliation Collaboration Jiaqi Bai1, Ze Yang2, Xinnian Liang2, Wei Wang3, Zhoujun Li1,2 1School of Cyber Science and Technology, Beihang University, Beijing, China 2State Key Lab of Software Development Environment, Beihang University, Beijing, China 3China Resources Group, Shenzhen, China
Pseudocode No No pseudocode or algorithm block found.
Open Source Code Yes Our code is released at https://github.com/jq2276/Learning2Copy.
Open Datasets Yes We conduct our experiments on two goal-oriented knowledge-driven datasets. One is the Du Conv (Wu et al. 2019), and the other is Du Rec Dial (Liu et al. 2020).
Dataset Splits No No explicit validation set split details (percentages or counts) are provided.
Hardware Specification Yes We trained our model on a GPU-V100 machine.
Software Dependencies No Only the Pytorch Framework is mentioned without a specific version number. No other software dependencies with version numbers are listed.
Experiment Setup Yes In our model, all of encoder and decoder have two-layer structures, each layer has 800 hidden units with the dropout rate 0.3 and the gradient clipping threshold is set to 5. The vocabulary size we used is 15k. We set the word embedding size to be 300, and initialize the embedding vectors randomly instead of using pre-trained word embeddings. We used the Adam optimizer (Kingma and Ba 2014), to minimize loss, the mini-batch size is 32 and the learning rate is 0.0001. We trained our model on a GPU-V100 machine. The whole training process is split into two stages. In the first stage, we train the model for 5 epochs to minimize the BOW loss only for pre-training the knowledge discernment module. In the second stage, we train the model at most 25 epochs to minimize overall loss.