Non-Autoregressive Dialog State Tracking
Authors: Hung Le, Richard Socher, Steven C.H. Hoi
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results show that our model achieves the state-of-the-art joint accuracy across all domains on the Multi WOZ 2.1 corpus, and the latency of our model is an order of magnitude lower than the previous state of the art as the dialogue history extends over time. We conduct extensive ablation studies in which our analysis reveals that our models can detect potential signals across slots and dialogue domains to generate more correct sets of slots for DST. |
| Researcher Affiliation | Collaboration | Hung Le , Richard Socher , Steven C.H. Hoi Salesforce Research {rsocher,shoi}@salesforce.com Singapore Management University hungle.2018@phdcs.smu.edu.sg |
| Pseudocode | No | The paper describes the model architecture and components in prose and with diagrams (Figure 1), but does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | We implemented our models using Py Torch (Paszke et al., 2017) and released the code on Git Hub 1. ^1https://github.com/henryhungle/NADST |
| Open Datasets | Yes | Multi WOZ (Budzianowski et al., 2018) is one of the largest publicly available multi-domain taskoriented dialogue dataset with dialogue domains extended over 7 domains. In this paper, we use the new version of the Multi WOZ dataset published by Eric et al. (2019). |
| Dataset Splits | Yes | The resulting corpus includes 8,438 multi-turn dialogues in training set with an average of 13.5 turns per dialogue. For the test and validation set, each includes 1,000 multi-turn dialogues with an average of 14.7 turns per dialogue. |
| Hardware Specification | No | All latency results are reported when running in a single identical GPU. |
| Software Dependencies | No | We implemented our models using Py Torch (Paszke et al., 2017) and released the code on Git Hub 1. |
| Experiment Setup | Yes | We employed dropout (Srivastava et al., 2014) of 0.2 at all network layers except the linear layers of generation network components and pointer attention components. We used a batch size of 32, embedding dimension d = 256 in all experiments. We also fixed the number of attention heads to 16 in all attention layers. ... In all experiments, the warmup steps are fine-tuned from a range from 13K to 20K training steps. |