Interpretable NLG for Task-oriented Dialogue Systems with Heterogeneous Rendering Machines

Authors: Yangming Li, Kaisheng Yao13306-13314

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our method, we have conducted extensive experiments on 5 benchmark datasets. In terms of automatic metrics (e.g., BLEU), our model is competitive with the current state-of-the-art method. The qualitative analysis shows that our model can interpret the rendering process of neural generators well. Human evaluation also confirms the interpretability of our proposed approach.
Researcher Affiliation Collaboration Yangming Li1*, Kaisheng Yao2 1Harbin Institute of Technology 2Ant Group yangmingli@ir.hit.edu.cn, kaisheng.yao@antgroup.com
Pseudocode No The paper describes the model architecture and components using text and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an unambiguous statement about releasing its own source code or a direct link to a code repository for the described methodology.
Open Datasets Yes We evaluate the models on five benchmark datasets. The Hotel dataset and the Restaurant dataset are collected in (Wen et al. 2015a). The Laptop dataset and the TV dataset are from (Wen et al. 2015b). The E2E-NLG dataset is released by a shared challenge (Novikova, Duˇsek, and Rieser 2017)1. All the datasets used in our paper follow the same format, pretreatment, and partition as in (Wen et al. 2015a,b; Novikova, Duˇsek, and Rieser 2017).
Dataset Splits Yes Other details of the datasets are demonstrated in Table 2. For each DA, we over-generate 10 utterances through beam search and select the top 5 candidates. In experiments, we select the model that works the best on the dev set, and then evaluate it on the test set. Table 2: The details of different datasets. |validation set| 1039 (R), 1075 (H), 2649 (L), 1407 (T), 4672 (E)
Hardware Specification Yes All the studies are conducted at Ge Force RTX 2080T.
Software Dependencies No The paper mentions using Adam for optimization but does not provide specific version numbers for any software dependencies, such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup Yes The dimensionalities for all embeddings are 256. The hidden units of all layers are set as 512. We adopt 3 layers of self-attention and each of them has 4 heads. L2 regularization is set as 1 × 10−6 and the dropout ratio is assigned 0.4 for reducing overfit. Above setting is obtained by using grid search. We use Adam (Kingma and Ba 2014) to optimize model parameters.