Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dialog State Tracking with Reinforced Data Augmentation
Authors: Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu9474-9481
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Wo Z and Multi Wo Z (restaurant) datasets demonstrate that the proposed framework significantly improves the performance over the state-of-the-art models, especially with limited training data. |
| Researcher Affiliation | Industry | Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu Huawei Noah s Ark Lab EMAIL |
| Pseudocode | Yes | Algorithm 1 The Reinforced Data Augmentation Input: Pre-trained Tracker with parameters θr; the randomly initialized Generator with parameters θπ; Output: Re-trained Tracker 1: Store θπ 2: for l = 1 L do 3: Re-initialize the Generator with θπ 4: for n = 1 N do 5: Re-initialize the Tracker with θr 6: Sample a bag B 7: for j = 1 M do 8: Sample a new bag B j 9: end for 10: Compute bag reward with Eq. 5 11: Compute instance reward with Eq. 6 12: Update θπ by the gradients in Eq.4 13: end for 14: Obtain new data D by the Generator 15: Re-train the Tracker on D + D , update θr 16: end for 17: Save the Tracker with θr which performs best on the validation set among the L epochs |
| Open Source Code | No | The paper mentions implementing the model using PyTorch and provides a link to pytorch.org in a footnote, but it does not state that the authors' own source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use Wo Z (Wen et al. 2017) and Multi Wo Z (Budzianowski et al. 2018) to evaluate the proposed framework on the task of dialog state tracking4. |
| Dataset Splits | No | The paper mentions using a "validation set" and performing "sub-sampling experiments with ... different ratios [10%, 20%, 50%] of the training set" but does not specify the explicit train/validation/test splits (e.g., percentages or counts) for the main datasets used in the core experiments. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments. |
| Software Dependencies | No | The paper states: "We implement the proposed model using Py Torch5." (Footnote 5 points to https://pytorch.org/). However, it only mentions PyTorch without a specific version number, and no other software dependencies with version numbers are listed. |
| Experiment Setup | Yes | All hyper-parameters of our model are tuned based on the validation set. [...] The epoch number of the alternate learning L, the epoch number of the generator learning N and the sampling times M for each bag are set to 5, 200 and 2 respectively. We set the dimensions of all hidden states to 200 in both the Tracker and the Generator, and set the head number of multihead Self-Attention to 4 in the Tracker. All learnable parameters are optimized by the ADAM optimizer with a learning rate of 1e-3. The batch size is set to 16 in the Tracker learning, and the bag size in the Generator learning is set to 25. To avoid over-fitting, we apply dropout to the layer of word embeddings with a rate of 0.2. [...] The newly augmented dataset is n times the size of the original training data (n = 5 for the Woz and n = 3 for Multi Woz). At each iteration, we randomly sample a subset of the augmented data to train the Tracker. The sampling ratios are 0.4 for Woz and 0.3 for Muti Woz. |