reproducibilityindex.ai

Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning

Authors: Vladimir Ilievski, Claudiu Musat, Andreea Hossman, Michael Baeriswyl

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our transfer learning based approach improves the bot’s success rate by 20% in relative terms for distant domains and we more than double it for close domains, compared to the model without transfer learning.
Researcher Affiliation	Collaboration	Vladimir Ilievski1, Claudiu Musat2, Andreea Hossmann2, Michael Baeriswyl2 1 School of Computer and Communication Sciences, EPFL, Switzerland 2 Artiﬁcial Intelligence Group Swisscom AG
Pseudocode	Yes	The pseudocode for this weight initialization is portrayed in the Algorithm 1.
Open Source Code	No	The paper provides a GitHub link (https://github.com/IlievskiV/MasterThesis_GO_Chatbots) in footnote 2, but it is explicitly stated to be for 'New published datasets' and does not claim to host the source code for the methodology itself.
Open Datasets	Yes	New published datasets: We publish new datasets for training Goal-Oriented Dialogue Systems, for restaurant booking and tourist info domains2. 2https://github.com/IlievskiV/MasterThesis_GO_Chatbots
Dataset Splits	No	The paper states 'For each domain, we have a training set of 120 user goals, and a testing set of 32 user goals' but does not explicitly mention a validation set or its split.
Hardware Specification	No	The paper discusses the use of Deep Reinforcement Learning and Deep Q-Networks but does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions techniques and models like 'Deep Q-Networks (DQN)' and 'Recurrent Neural Networks (RNNs)' but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 1.x, PyTorch 0.x).
Experiment Setup	Yes	In all experiments, when we use a warm-starting, the criterion is to ﬁll agent’s buffer, such that 30 percent of the buffer is ﬁlled with positive experiences (coming from a successful dialogue). After that, we train for nepochs = 50 epochs, each simulating ndialogues = 100 dialogues. We ﬂush the agent’s buffer when the agent reaches, for a ﬁrst time, a success rate of srule based = 0.3. We set the maximal number of allowed dialogue turns nmax turns to 20, thus the negative reward rnegative for a failed dialogue is 20, while the positive reward rpositive for a successful dialogue is 40. In the consecutive dialogue turns over the course of the conversation, the agent receives a negative reward of rongoing = 1. In all cases we set ϵ = 0.05 to leave a space for exploration.