reproducibilityindex.ai

Goal-Oriented Dialogue Policy Learning from Failures

Authors: Keting Lu, Shiqi Zhang, Xiaoping Chen2596-2603

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate.
Researcher Affiliation	Academia	Keting Lu,1 Shiqi Zhang,2 Xiaoping Chen1 1School of Computer Science, University of Science and Technology of China 2Department of Computer Science, SUNY Binghamton
Pseudocode	Yes	Algorithm 1 Dialogue Segmentation
Open Source Code	No	No explicit statement or link providing access to the open-source code for the described methodology was found.
Open Datasets	Yes	Our complex HER methods were evaluated using a dialogue simulation environment, where a dialogue agent communicates with simulated users on movie-booking tasks (Li et al. 2016; 2017).
Dataset Splits	No	The paper describes the number of dialogue episodes and runs but does not specify explicit training, validation, and test dataset splits as it uses a simulation environment where dialogues are generated.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running experiments.
Software Dependencies	No	The paper mentions the use of Deep Q-Networks (DQNs) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks.
Experiment Setup	Yes	The size of experience pool is 100k, and experience replay strategy is uniform sampling. The value of α in Equation 2 is 1.0, and ϵ greedy policy is used, where ϵ is initialized with 0.3, and decayed to 0.01 during training. Each experiment includes 1000 epochs. Each epoch includes 100 dialogue episodes. By the end of each epoch, we update the weights of target network using the current behavior network, and this update operation executes once every epoch.