reproducibilityindex.ai

Non-Autoregressive Dialog State Tracking

Authors: Hung Le, Richard Socher, Steven C.H. Hoi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results show that our model achieves the state-of-the-art joint accuracy across all domains on the Multi WOZ 2.1 corpus, and the latency of our model is an order of magnitude lower than the previous state of the art as the dialogue history extends over time. We conduct extensive ablation studies in which our analysis reveals that our models can detect potential signals across slots and dialogue domains to generate more correct sets of slots for DST.
Researcher Affiliation	Collaboration	Hung Le , Richard Socher , Steven C.H. Hoi Salesforce Research {rsocher,shoi}@salesforce.com Singapore Management University hungle.2018@phdcs.smu.edu.sg
Pseudocode	No	The paper describes the model architecture and components in prose and with diagrams (Figure 1), but does not present any formal pseudocode or algorithm blocks.
Open Source Code	Yes	We implemented our models using Py Torch (Paszke et al., 2017) and released the code on Git Hub 1. ^1https://github.com/henryhungle/NADST
Open Datasets	Yes	Multi WOZ (Budzianowski et al., 2018) is one of the largest publicly available multi-domain taskoriented dialogue dataset with dialogue domains extended over 7 domains. In this paper, we use the new version of the Multi WOZ dataset published by Eric et al. (2019).
Dataset Splits	Yes	The resulting corpus includes 8,438 multi-turn dialogues in training set with an average of 13.5 turns per dialogue. For the test and validation set, each includes 1,000 multi-turn dialogues with an average of 14.7 turns per dialogue.
Hardware Specification	No	All latency results are reported when running in a single identical GPU.
Software Dependencies	No	We implemented our models using Py Torch (Paszke et al., 2017) and released the code on Git Hub 1.
Experiment Setup	Yes	We employed dropout (Srivastava et al., 2014) of 0.2 at all network layers except the linear layers of generation network components and pointer attention components. We used a batch size of 32, embedding dimension d = 256 in all experiments. We also ﬁxed the number of attention heads to 16 in all attention layers. ... In all experiments, the warmup steps are ﬁne-tuned from a range from 13K to 20K training steps.