Goal-Oriented Dialogue Policy Learning from Failures

Authors: Keting Lu, Shiqi Zhang, Xiaoping Chen2596-2603

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate.
Researcher Affiliation Academia Keting Lu,1 Shiqi Zhang,2 Xiaoping Chen1 1School of Computer Science, University of Science and Technology of China 2Department of Computer Science, SUNY Binghamton
Pseudocode Yes Algorithm 1 Dialogue Segmentation
Open Source Code No No explicit statement or link providing access to the open-source code for the described methodology was found.
Open Datasets Yes Our complex HER methods were evaluated using a dialogue simulation environment, where a dialogue agent communicates with simulated users on movie-booking tasks (Li et al. 2016; 2017).
Dataset Splits No The paper describes the number of dialogue episodes and runs but does not specify explicit training, validation, and test dataset splits as it uses a simulation environment where dialogues are generated.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running experiments.
Software Dependencies No The paper mentions the use of Deep Q-Networks (DQNs) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks.
Experiment Setup Yes The size of experience pool is 100k, and experience replay strategy is uniform sampling. The value of α in Equation 2 is 1.0, and ϵ greedy policy is used, where ϵ is initialized with 0.3, and decayed to 0.01 during training. Each experiment includes 1000 epochs. Each epoch includes 100 dialogue episodes. By the end of each epoch, we update the weights of target network using the current behavior network, and this update operation executes once every epoch.