MA-DST: Multi-Attention-Based Scalable Dialog State Tracking
Authors: Adarsh Kumar, Peter Ku, Anuj Goyal, Angeliki Metallinou, Dilek Hakkani-Tur8107-8114
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model improves the joint goal accuracy by 5% (absolute) in the full-data setting and by up to 2% (absolute) in the zero-shot setting over the present state-of-the-art on the Multi Wo Z 2.1 dataset. We evaluate our approach on Multi WOZ, a multi-domain Wizard-of-Oz dataset. In this section we first describe the evaluation metrics and then present the results of our experiments. We compare the the accuracy of MA-DST with the TRADE baseline and four additional ablation variants of our model. |
| Researcher Affiliation | Collaboration | Adarsh Kumar,1 Peter Ku,2 Anuj Goyal,2 Angeliki Metallinou,2 Dilek Hakkani-Tur2 1University of Wisconsin-Madison, 2Amazon Alexa AI, Sunnyvale, CA, USA adarsh@cs.wisc.edu, {kupeter, anujgoya, ametalli, hakkanit}@amazon.com |
| Pseudocode | No | The paper describes its model architecture and components but does not provide any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We evaluate our approach on Multi WOZ, a multi-domain Wizard-of-Oz dataset. Multi WOZ 2.0 is a recent dataset of labeled human-human written conversations spanning multiple domains and topics (Budzianowski et al. 2018). ... (Eric et al. 2019) released an updated version, called Multi WOZ 2.1, which corrected a significant number of errors. Here, we use the Multi WOZ 2.1 dataset as our benchmark. |
| Dataset Splits | Yes | We use the provided train/dev/test split for our experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using GloVE, ELMo, GRU, and Adam Optimizer, but it does not specify any version numbers for these or other software libraries/frameworks. |
| Experiment Setup | Yes | We train the model using stochastic gradient descent and use the Adam Optimizer. We empirically optimized the learning rate in the range [0.0005 0.001] and used 0.0005 for the final model, while we kept betas as (0.9, 0.999) and epsilon 1x10 08. We used a batch size of four dialog turns and for each turn we generate all 30 slot values. We decayed the learning rate after regular intervals (3 epochs) by a factor of θ (0.25)... For ELMo, we kept a dropout of 0.5 for the contexual embedding and used l2 regularization for the weights of ELMo. We used a dropout of 0.2 for all the layers everywhere else. For word embeddings, we used 300-dimensional Glo Ve embeddings and 100-dimensional character embeddings. For all the GRU and attention layers the hidden size is kept at 400. The weight γ for the multi-task loss function in equation 18 is kept at 1. |