Learning Conversational Systems that Interleave Task and Non-Task Content
Authors: Zhou Yu, Alexander Rudnicky, Alan Black
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test the effectiveness of the proposed framework, we developed a movie promotion dialog system. Experiments with human users indicate that a system that interleaves social and task content achieves a better task success rate and is also rated as more engaging compared to a pure task-oriented system. |
| Researcher Affiliation | Academia | Zhou Yu Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, 15213 zhouyu@cs.cmu.edu Alan W Black Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, 15213 awb@cs.cmu.edu Alexander I. Rudnicky Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, 15213 air@cs.cmu.edu |
| Pseudocode | No | The paper describes the framework and its components, including the use of Q-learning, but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We published the source code of the software implementation of the framework, an example movie promotion system, and the conversation data collected with human users.1 https://github.com/echoyuzhou/ticktock_ text_api |
| Open Datasets | Yes | A keyword retrieval method trained on a CNN interview corpus [Yu et al., 2015b]. A skip-thought vector model [Kiros et al., 2015] trained on the Movie Subtitle dataset [Lison and Tiedemann, 2016] |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about a validation dataset split or cross-validation setup. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as CPU or GPU models used for experiments. |
| Software Dependencies | No | The paper describes the components of the framework and algorithms used (e.g., reinforcement learning, Q-learning), but it does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | In a reinforcement learning setting, we formulate the problem as (S, A, R, γ, α), where S is a set of states that represents the system s environment, in this case the conversation history so far. A is a set of actions available per state. ... The action that is optimal for each state is the action that has the highest long-term reward. This reward is a weighted sum of the expected values of the rewards of all future steps starting from the current state, where the discount factor γ (0, 1) trades off the importance of sooner versus later rewards. ... It took 200, 1000, and 8000 conversations respectively for the Task Global, Mix-Local, and Mix-Global systems to converge. |