UBAR: Towards Fully End-to-End Task-Oriented Dialog System with GPT-2
Authors: Yunyi Yang, Yunhao Li, Xiaojun Quan14230-14238
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Multi WOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. |
| Researcher Affiliation | Academia | Yunyi Yang,Yunhao Li, Xiaojun Quan* Sun Yat-sen University {yangyy37, liyh355}@mail2.sysu.edu.cn, quanxj3@mail.sysu.edu.cn |
| Pseudocode | No | No explicitly labeled pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Code and technical appendix available at https://github.com/Tony Nemo/UBAR-Multi WOZ |
| Open Datasets | Yes | Multi WOZ 2.0 (Budzianowski et al. 2018) is a large-scale human-to-human multi-domain task-oriented dialog dataset consisting of 8438 dialogues spanning over seven domains (attraction, hospital, police, hotel, restaurant, taxi, train). It provides additional validation set and test set each of 1000 dialogues, excluding hospital and police. |
| Dataset Splits | Yes | Multi WOZ 2.0 (Budzianowski et al. 2018) is a large-scale human-to-human multi-domain task-oriented dialog dataset consisting of 8438 dialogues spanning over seven domains (attraction, hospital, police, hotel, restaurant, taxi, train). It provides additional validation set and test set each of 1000 dialogues, excluding hospital and police. |
| Hardware Specification | No | No specific hardware details such as CPU/GPU models, memory, or cloud instance types used for running experiments were found. |
| Software Dependencies | No | The paper mentions 'Hugging Face s Transformers (Wolf et al. 2019) and Distil GPT2 (Sanh et al. 2019)' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The model is trained on session-level sequences with a max sequence length of 1024. Sequences that exceed 1024 tokens are pre-truncated. We use the Adam W optimizer and standard greedy decoding method with temperature of 0.7. We select the best performing model on validation set through hyperparameters search of learning rate and batch size, then evaluate on test set to get the final results. |