Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning

Authors: Vladimir Ilievski, Claudiu Musat, Andreea Hossman, Michael Baeriswyl

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our transfer learning based approach improves the bot’s success rate by 20% in relative terms for distant domains and we more than double it for close domains, compared to the model without transfer learning.
Researcher Affiliation Collaboration Vladimir Ilievski1, Claudiu Musat2, Andreea Hossmann2, Michael Baeriswyl2 1 School of Computer and Communication Sciences, EPFL, Switzerland 2 Artificial Intelligence Group Swisscom AG
Pseudocode Yes The pseudocode for this weight initialization is portrayed in the Algorithm 1.
Open Source Code No The paper provides a GitHub link (https://github.com/IlievskiV/MasterThesis_GO_Chatbots) in footnote 2, but it is explicitly stated to be for 'New published datasets' and does not claim to host the source code for the methodology itself.
Open Datasets Yes New published datasets: We publish new datasets for training Goal-Oriented Dialogue Systems, for restaurant booking and tourist info domains2. 2https://github.com/IlievskiV/MasterThesis_GO_Chatbots
Dataset Splits No The paper states 'For each domain, we have a training set of 120 user goals, and a testing set of 32 user goals' but does not explicitly mention a validation set or its split.
Hardware Specification No The paper discusses the use of Deep Reinforcement Learning and Deep Q-Networks but does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions techniques and models like 'Deep Q-Networks (DQN)' and 'Recurrent Neural Networks (RNNs)' but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 1.x, PyTorch 0.x).
Experiment Setup Yes In all experiments, when we use a warm-starting, the criterion is to fill agent’s buffer, such that 30 percent of the buffer is filled with positive experiences (coming from a successful dialogue). After that, we train for nepochs = 50 epochs, each simulating ndialogues = 100 dialogues. We flush the agent’s buffer when the agent reaches, for a first time, a success rate of srule based = 0.3. We set the maximal number of allowed dialogue turns nmax turns to 20, thus the negative reward rnegative for a failed dialogue is 20, while the positive reward rpositive for a successful dialogue is 40. In the consecutive dialogue turns over the course of the conversation, the agent receives a negative reward of rongoing = 1. In all cases we set ϵ = 0.05 to leave a space for exploration.