Learning to Imagine: Distillation-Based Interactive Context Exploitation for Dialogue State Tracking

Authors: Jinyu Guo, Kai Shuang, Kaihang Zhang, Yixuan Liu, Jijie Li, Zihan Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach extensively improves the performance of partial-history DST models and thereby achieves new stateof-the-art performance on multiple mainstream datasets while keeping high efficiency.
Researcher Affiliation Academia Jinyu Guo1,2, Kai Shuang1,2*, Kaihang Zhang1,2, Yixuan Liu1,2, Jijie Li3, Zihan Wang4 1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications 2School of Computer Science, Beijing University of Posts and Telecommunications 3Beijing Academy of Artificial Intelligence, Beijing, China 4Graduate School of Information Science and Technology, The University of Tokyo
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes Code is available at https://github.com/guojinyu88/DICE-DST
Open Datasets Yes Our proposed method is evaluated in most of the mainstream benchmark task-oriented dialogue challenges: Multi WOZ 2.2, Multi WOZ 2.1, Sim-R, Sim-M, and DSTC2.
Dataset Splits No The paper mentions using standard benchmark datasets but does not explicitly provide specific percentages, sample counts, or detailed methodology for training, validation, and test splits within the text. It only mentions 'on all test sets'.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only generally refers to training without specifying the hardware.
Software Dependencies No While the paper mentions 'ALBERT-large model' and 'Adam W optimizer', it does not provide specific version numbers for underlying software libraries or dependencies (e.g., PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We group 32 samples as a batch to jointly train the teacher encoder. For the student encoder, we employ a group size of 8 to batch process dialogue turns. ... We set the maximum number of turns as 16 and the maximum length of tokens as 512. We use Adam W optimizer (Loshchilov and Hutter 2017) with β 1 = 0.9, β 2 = 0.999, ϵ = 1e 8 and set the warmup proportion to 0.1. We set the learning rate of the pre-trained language model parameters to 2e 5 and the learning rate of the other parameters to 1e 4. We utilize dropout (Srivastava et al. 2014) with the probability of 0.1.