reproducibilityindex.ai

Dialogue State Induction Using Neural Latent Variable Models

Authors: Qingkai Min, Libo Qin, Zhiyang Teng, Xiao Liu, Yue Zhang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments over the Multi WOZ [Budzianowski et al., 2018] and the SGD [Rastogi et al., 2019] datasets show that both DSI models can effectively find meaningful dialogue states.
Researcher Affiliation	Academia	Qingkai Min1,2 , Libo Qin3 , Zhiyang Teng1,2 , Xiao Liu4 and Yue Zhang1,2 1School of Engineering, Westlake University 2Institute of Advanced Technology, Westlake Institute for Advanced Study 3Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology 4School of Computer Science and Technology, Beijing Institute of Technology
Pseudocode	Yes	Algorithm 1 DSI-base and Algorithm 2 DSI-GM
Open Source Code	Yes	We release our code at https://github.com/taolusi/dialogue-state-induction.
Open Datasets	Yes	We evaluate our proposed DSI task and its effectiveness on the downstream tasks using the Multi WOZ 2.1 [Eric et al., 2019] dataset, which fixes some noisy state annotations in the Multi WOZ 2.0 [Budzianowski et al., 2018] dataset. Multi WOZ2.1 contains 10,438 multi-turn dialogues and we follow the same partition as Wu et al. [2019a]. To justify the generalization of the proposed model, we also use a recently proposed SGD [Rastogi et al., 2019] dataset, which contains 16,142 multi-turn dialogues and is the largest existing conversational corpus.
Dataset Splits	Yes	All the models are trained over the training set, where hyper-parameters are tuned on the development set, before being finally used on the test set. We use the same train/validation split sets as Rastogi et al. [2019].
Hardware Specification	No	The paper mentions that BERT is 'much slower and more resource-intensive for training' but does not specify any particular hardware details such as GPU models, CPU types, or memory specifications used for experiments.
Software Dependencies	No	The paper mentions using a 'pre-trained ELMo model' and the 'Stanford Core NLP toolkit' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	For the DSI models, the Adam optimizer is used to maximize the ELBO of Equation 15. All the models are trained over the training set, where hyper-parameters are tuned on the development set, before being finally used on the test set. Since no manual labels are available, we follow Liu et al. [2019a] and select the hyper-parameters which fit the best ELBO score on the dev set as shown in Table 2. Table 2: Domain number (DSI-GM only) 100 Batch size 200 Slot number (DSI-base/DSI-GM) 300/1000 Dropout 0.2 Feature dimension 256 Learning rate 0.02 Linear transformation layer size 100 Momentum 0.99