Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning
Authors: Vladimir Ilievski, Claudiu Musat, Andreea Hossman, Michael Baeriswyl
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our transfer learning based approach improves the bot’s success rate by 20% in relative terms for distant domains and we more than double it for close domains, compared to the model without transfer learning. |
| Researcher Affiliation | Collaboration | Vladimir Ilievski1, Claudiu Musat2, Andreea Hossmann2, Michael Baeriswyl2 1 School of Computer and Communication Sciences, EPFL, Switzerland 2 Artificial Intelligence Group Swisscom AG |
| Pseudocode | Yes | The pseudocode for this weight initialization is portrayed in the Algorithm 1. |
| Open Source Code | No | The paper provides a GitHub link (https://github.com/IlievskiV/MasterThesis_GO_Chatbots) in footnote 2, but it is explicitly stated to be for 'New published datasets' and does not claim to host the source code for the methodology itself. |
| Open Datasets | Yes | New published datasets: We publish new datasets for training Goal-Oriented Dialogue Systems, for restaurant booking and tourist info domains2. 2https://github.com/IlievskiV/MasterThesis_GO_Chatbots |
| Dataset Splits | No | The paper states 'For each domain, we have a training set of 120 user goals, and a testing set of 32 user goals' but does not explicitly mention a validation set or its split. |
| Hardware Specification | No | The paper discusses the use of Deep Reinforcement Learning and Deep Q-Networks but does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions techniques and models like 'Deep Q-Networks (DQN)' and 'Recurrent Neural Networks (RNNs)' but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 1.x, PyTorch 0.x). |
| Experiment Setup | Yes | In all experiments, when we use a warm-starting, the criterion is to fill agent’s buffer, such that 30 percent of the buffer is filled with positive experiences (coming from a successful dialogue). After that, we train for nepochs = 50 epochs, each simulating ndialogues = 100 dialogues. We flush the agent’s buffer when the agent reaches, for a first time, a success rate of srule based = 0.3. We set the maximal number of allowed dialogue turns nmax turns to 20, thus the negative reward rnegative for a failed dialogue is 20, while the positive reward rpositive for a successful dialogue is 40. In the consecutive dialogue turns over the course of the conversation, the agent receives a negative reward of rongoing = 1. In all cases we set ϵ = 0.05 to leave a space for exploration. |