reproducibilityindex.ai

Efficient Dialog Policy Learning by Reasoning with Contextual Knowledge

Authors: Haodi Zhang, Zhichao Zeng, Keting Lu, Kaishun Wu, Shiqi Zhang11667-11675

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have extensively conducted experiments using a realistic dialog platform Py Dial (Ultes et al. 2017). Compared with baselines from the literature and ablations of our own approach, we observe signiﬁcant improvements in dialog learning efﬁciency and policy quality.
Researcher Affiliation	Collaboration	Haodi Zhang1, Zhichao Zeng1, Keting Lu2, Kaishun Wu1, Shiqi Zhang3 1 Computer Science and Software Engineering, Shenzhen University 2 Baidu, Inc. 3 Computer Science, SUNY Binghamton
Pseudocode	Yes	Algorithm 1: Dialog policy learning by reasoning with contextual knowledge
Open Source Code	Yes	More details are available in the supplementary appendix and code1. 1https://github.com/ResearchGroupHdZhang/DPLAAAI22
Open Datasets	No	In the experiments, we use a revised version of a hotel booking domain in Py Dial (Casanueva et al. 2017), where the main slots, i.e., internal factors, are the same with I in the previous section. Besides the revised evaluation criteria, we also modiﬁed the database to evaluate the reasoning capabilities of our developed approach. We enlarged the original database, so that the user goals would not be frequently rejected due to the lack of diverse data entities. More details are available in the supplementary appendix and code1. (The paper mentions a modified database and 'historical data' for MLN without providing explicit access for these specific datasets. While PyDial is cited, the modified dataset used is not directly made available or linked.)
Dataset Splits	No	The environment parameters are selected via a validation set. In each run, we use 40 batches and each of them contains 100 dialogs. After training with each batch, the policy is evaluated using 100 dialogs. (While a validation set is mentioned, its specific size or proportion within the dataset splits is not provided.)
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU models, memory, or cloud instances) are mentioned in the paper regarding the experimental setup.
Software Dependencies	No	In the experiment, we used a revised version of a hotel booking domain in Py Dial (Casanueva et al. 2017)... For internal knowledge, we utilize Alchemy (Kok et al. 2005) to train a MLN... For external knowledge, we use Clingo (Gebser et al. 2014) to ground and solve our ASP logic programs. (No specific version numbers are provided for PyDial, Alchemy, Clingo, or any other software dependencies.)
Experiment Setup	No	In the experiment, we used several popular dialog strategy algorithms as baselines, including A2C (Fatemi et al. 2016), DQN, ACER (Weisz et al. 2018) and BBQN (Lipton et al. 2018). The environment parameters are selected via a validation set. In each run, we use 40 batches and each of them contains 100 dialogs. After training with each batch, the policy is evaluated using 100 dialogs. (Specific hyperparameter values like learning rate, optimizer settings, or epoch counts for the DRL models are not provided.)