Learning Conversational Systems that Interleave Task and Non-Task Content

Authors: Zhou Yu, Alexander Rudnicky, Alan Black

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test the effectiveness of the proposed framework, we developed a movie promotion dialog system. Experiments with human users indicate that a system that interleaves social and task content achieves a better task success rate and is also rated as more engaging compared to a pure task-oriented system.
Researcher Affiliation Academia Zhou Yu Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, 15213 zhouyu@cs.cmu.edu Alan W Black Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, 15213 awb@cs.cmu.edu Alexander I. Rudnicky Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, 15213 air@cs.cmu.edu
Pseudocode No The paper describes the framework and its components, including the use of Q-learning, but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes We published the source code of the software implementation of the framework, an example movie promotion system, and the conversation data collected with human users.1 https://github.com/echoyuzhou/ticktock_ text_api
Open Datasets Yes A keyword retrieval method trained on a CNN interview corpus [Yu et al., 2015b]. A skip-thought vector model [Kiros et al., 2015] trained on the Movie Subtitle dataset [Lison and Tiedemann, 2016]
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about a validation dataset split or cross-validation setup.
Hardware Specification No The paper does not specify any particular hardware components such as CPU or GPU models used for experiments.
Software Dependencies No The paper describes the components of the framework and algorithms used (e.g., reinforcement learning, Q-learning), but it does not specify any software dependencies with version numbers.
Experiment Setup Yes In a reinforcement learning setting, we formulate the problem as (S, A, R, γ, α), where S is a set of states that represents the system s environment, in this case the conversation history so far. A is a set of actions available per state. ... The action that is optimal for each state is the action that has the highest long-term reward. This reward is a weighted sum of the expected values of the rewards of all future steps starting from the current state, where the discount factor γ (0, 1) trades off the importance of sooner versus later rewards. ... It took 200, 1000, and 8000 conversations respectively for the Task Global, Mix-Local, and Mix-Global systems to converge.