KPT: Keyword-Guided Pre-training for Grounded Dialog Generation
Authors: Qi Zhu, Fei Mi, Zheng Zhang, Yasheng Wang, Yitong Li, Xin Jiang, Qun Liu, Xiaoyan Zhu, Minlie Huang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on various few-shot knowledge-grounded generation tasks, including grounding on dialog acts, knowledge graphs, persona descriptions, and Wikipedia passages. Our comprehensive experiments and analyses demonstrate that KPT consistently outperforms state-of-the-art methods on these tasks with diverse grounding knowledge. |
| Researcher Affiliation | Collaboration | 1Co AI Group, DCST, IAI, BNRIST, Tsinghua University 2Huawei Noah s Ark Lab |
| Pseudocode | Yes | Algorithm 1: Prepare keyword-guided pre-training data |
| Open Source Code | No | The paper mentions using 'Conv Lab-3 (Zhu et al. 2022) for dataset loading and model training' which is a third-party toolkit, but does not provide a link or statement for the authors' own implementation code. |
| Open Datasets | Yes | As shown in Table 1, our pre-training datasets include Daily Dialog (Li et al. 2017), Schema-Guided Dialog (Rastogi et al. 2020), Taskmaster-1/2/3 (Byrne et al. 2019, 2021), Meta LWOZ (Li et al. 2020), DSTC8-Reddit (Lee et al. 2019), and Wiki Dialog (Dai et al. 2022), covering chit-chats, goal-oriented dialogs, and information seeking dialogs. |
| Dataset Splits | Yes | We randomly split the data into training (70%), validation (15%), and test set (15%). We fine-tune the models until the validation loss does not decrease for 5 consecutive epochs. Models with the lowest validation losses during training are selected as the final models. |
| Hardware Specification | Yes | We set the batch size per GPU to 64 and use 8/2 Tesla V100 32G GPUs for pre-training/fine-tuning. |
| Software Dependencies | No | The paper mentions software components like 'T5 (Raffel et al. 2020)', 'GPT-2 Large (Radford et al. 2019)', 'Dialo GPT Large (762M)', and 'Conv Lab-3 (Zhu et al. 2022)'. While specific models/toolkits are named, explicit version numbers for T5, GPT-2, or PyTorch/CUDA are not provided. |
| Experiment Setup | Yes | We consider two sizes of model: 60M T5-small and 220M T5-base. For both RG and KPT, we pre-train the models for 1 epoch. During pre-training, we set the keyword ratio α to 0.3... We use Adafactor optimizer with a constant learning rate 1e-3 for both pre-training and fine-tuning. We set the batch size per GPU to 64... |