Dialog State Tracking with Reinforced Data Augmentation

Authors: Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu9474-9481

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Wo Z and Multi Wo Z (restaurant) datasets demonstrate that the proposed framework significantly improves the performance over the state-of-the-art models, especially with limited training data.
Researcher Affiliation Industry Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu Huawei Noah s Ark Lab {yinyichun, shang.lifeng, jiang.xin, chen.xiao2, qun.liu}@huawei.com
Pseudocode Yes Algorithm 1 The Reinforced Data Augmentation Input: Pre-trained Tracker with parameters θr; the randomly initialized Generator with parameters θπ; Output: Re-trained Tracker 1: Store θπ 2: for l = 1 L do 3: Re-initialize the Generator with θπ 4: for n = 1 N do 5: Re-initialize the Tracker with θr 6: Sample a bag B 7: for j = 1 M do 8: Sample a new bag B j 9: end for 10: Compute bag reward with Eq. 5 11: Compute instance reward with Eq. 6 12: Update θπ by the gradients in Eq.4 13: end for 14: Obtain new data D by the Generator 15: Re-train the Tracker on D + D , update θr 16: end for 17: Save the Tracker with θr which performs best on the validation set among the L epochs
Open Source Code No The paper mentions implementing the model using PyTorch and provides a link to pytorch.org in a footnote, but it does not state that the authors' own source code for the described methodology is publicly available.
Open Datasets Yes We use Wo Z (Wen et al. 2017) and Multi Wo Z (Budzianowski et al. 2018) to evaluate the proposed framework on the task of dialog state tracking4.
Dataset Splits No The paper mentions using a "validation set" and performing "sub-sampling experiments with ... different ratios [10%, 20%, 50%] of the training set" but does not specify the explicit train/validation/test splits (e.g., percentages or counts) for the main datasets used in the core experiments.
Hardware Specification No The paper does not specify the hardware used for running the experiments.
Software Dependencies No The paper states: "We implement the proposed model using Py Torch5." (Footnote 5 points to https://pytorch.org/). However, it only mentions PyTorch without a specific version number, and no other software dependencies with version numbers are listed.
Experiment Setup Yes All hyper-parameters of our model are tuned based on the validation set. [...] The epoch number of the alternate learning L, the epoch number of the generator learning N and the sampling times M for each bag are set to 5, 200 and 2 respectively. We set the dimensions of all hidden states to 200 in both the Tracker and the Generator, and set the head number of multihead Self-Attention to 4 in the Tracker. All learnable parameters are optimized by the ADAM optimizer with a learning rate of 1e-3. The batch size is set to 16 in the Tracker learning, and the bag size in the Generator learning is set to 25. To avoid over-fitting, we apply dropout to the layer of word embeddings with a rate of 0.2. [...] The newly augmented dataset is n times the size of the original training data (n = 5 for the Woz and n = 3 for Multi Woz). At each iteration, we randomly sample a subset of the augmented data to train the Tracker. The sampling ratios are 0.4 for Woz and 0.3 for Muti Woz.