reproducibilityindex.ai

A Student-Teacher Architecture for Dialog Domain Adaptation Under the Meta-Learning Setting

Authors: Kun Qian, Wei Wei, Zhou Yu13692-13700

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on two multi-domain datasets, Multi WOZ and Google Schema-Guided Dialogue, and achieve state-of-the-art performance. Experimental results show that our model is effective in extracting domain-speciﬁc features and achieves a better domain adaptation performance.
Researcher Affiliation	Collaboration	1 University of California, Davis 2 Google Inc.
Pseudocode	Yes	Algorithm DAST
Open Source Code	No	We will release the code base upon acceptance.
Open Datasets	Yes	We evaluate our model on two multi-domain datasets, Multi WOZ (Budzianowski et al. 2018) and Schema-Guided Dataset (Rastogi et al. 2019).
Dataset Splits	Yes	For the adaptation, we randomly choose nine dialogs (2% of source domain) in the target domain as adaptation data and leave the rest for testing. The learning rate decays by half if no improvement is observed on validation data for 3 successive epochs and the training process would stop early when no improvement is observed on validation data for 5 successive epochs.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments.
Software Dependencies	No	The paper mentions software components and algorithms like GloVe, Adam, GRU, and Transformer, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	We adopt Glo Ve (Pennington, Socher, and Manning 2014) as the initialized value for word embeddings, with an embedding size of 50. For the student model, each GRU from encoders and decoders contains one layer and the hidden size is set as 100. Furthermore, the GRU models of two encoders are bi-directional. As for the teacher model, it contains 2 self-attention layers with 5 heads for each. We use Adam (Kingma and Ba 2014) for optimization and set an initialized learning rate as 0.005 for both student and teacher model, as well as the meta optimizer. The learning rate decays by half if no improvement is observed on validation data for 3 successive epochs and the training process would stop early when no improvement is observed on validation data for 5 successive epochs. We adopt the batch normalization (Ioffe and Szegedy 2015) and use a batch size of 32.