Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive Learning
Authors: Jiaan Wang, JIanfeng Qu, Kexin Wang, Zhixu Li, Wen Hua, Ximing Li, An Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three benchmark datasets show that our method achieves new state-of-the-art performance in terms of automatic evaluation scores, verifying its effectiveness and potentiality. |
| Researcher Affiliation | Academia | Jiaan Wang1, Jianfeng Qu1*, Kexin Wang1, Zhixu Li2*, Wen Hua3, Ximing Li4, An Liu1 1 School of Computer Science and Technology, Soochow University, Suzhou, China 2 Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China 3 The Hong Kong Polytechnic University, Hong Kong SAR, China 4 College of Computer Science and Technology, Jilin University, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1https://github.com/kxinwang2023/EnCo |
| Open Datasets | Yes | We conduct experiments on three widely-used public KGD datasets: (1) Kd Conv (Zhou et al. 2020) involves 4.5K dialogues and 86K utterances from music, travel and film domains. ... (2) Du Conv (Wu et al. 2019) contains 180K samples... (3) Du Rec Dial (Liu et al. 2020) has 145K samples... |
| Dataset Splits | No | The paper mentions using a 'development set' for hyperparameter tuning and a 'test set' for evaluation, but does not explicitly provide the training/test/validation dataset splits (e.g., specific percentages or sample counts for each partition) needed to reproduce the experiment, nor does it state if standard splits provided with the datasets were used. |
| Hardware Specification | Yes | We train the KGD model on two 32GB Tesla V100 GPUs. |
| Software Dependencies | No | The paper states 'We implement our En Co framework with Py Torch and Huggingface Transformers (Wolf et al. 2020) libraries' but does not specify version numbers for these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | The numbers (Ne and Nd) of encoder layers as well as decoder layers are 12, and the hidden dimension d is 1,024. The number of multi-head attention in the context-knowledge fusion module (m) is set to 8. Following Wang et al. (2022c), the embedding size of entities and relations is set to 200. For each KDG sample, we create 5 positive and 5 negative samples. We leverage the Adam optimizer with a default initial momentum and adopt linear warmup in the first 1,000 steps. The mini-batch size is set to 8, and the coefficient α in the final loss function is set to 1.0. We use minimal hyperparameter tuning using Learning Rates (LRs) in [1e5, 2e-5, 3e-5, 5e-5] and epochs of 10 to 20. We find the model with LR of 5e-5 and 20 epochs to work best. |