Towards Building Large Scale Multimodal Domain-Aware Conversation Systems

Authors: Amrita Saha, Mitesh Khapra, Karthik Sankaranarayanan

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To overcome this bottleneck, in this paper we introduce the task of multimodal, domain-aware conversations, and propose the MMD benchmark dataset. This dataset was gathered by working in close coordination with large number of domain experts in the retail domain. These experts suggested various conversations flows and dialog states which are typically seen in multimodal conversations in the fashion domain. Keeping these flows and states in mind, we created a dataset consisting of over 150K conversation sessions between shoppers and sales agents, with the help of in-house annotators using a semi-automated manually intense iterative process. With this dataset, we propose 5 new sub-tasks for multimodal conversations along with their evaluation methodology. We also propose two multimodal neural models in the encode-attend-decode paradigm and demonstrate their performance on two of the sub-tasks, namely text response generation and best image response selection. These experiments serve to establish baseline performance and open new research directions for each of these sub-tasks.
Researcher Affiliation Collaboration Amrita Saha IBM Research AI and Indian Institute of Technology Madras, India amrsaha4@in.ibm.com Mitesh M. Khapra Indian Institute of Technology Madras, India miteshk@cse.iitm.ac.in Karthik Sankaranarayanan IBM Research AI kartsank@in.ibm.com
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper provides a link for the dataset and scripts for data extraction, but not for the source code of the proposed models (HRED models) described in the paper's methodology.
Open Datasets Yes To facilitate further research on multimodal systems, the MMD dataset created as a part of this work will be made available at https://github.com/iitm-nlp-miteshk/Amrita Saha/ tree/master/MMD (please copy paste the URL in a browser instead of clicking on it). This URL will contain the following resources: the train, valid, test splits of the two versions of the MMD dataset and the script to extract the state-wise data for each of the states elaborated in Table 2
Dataset Splits Yes Dataset Statistics Train Valid Test #Dialogs(chat sessions) 105,439 22,595 22,595 Proportion in terms of dialogs 70% 15% 15% Avg. #Utterances per dialog 40 40 40
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper mentions software components like GRU cells, VGGNet-16, and Adam optimization algorithm, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We used Adam optimization algorithm and tuned the following hyperparameters using the validation set; learning rate {1e-3, 4e-4}, RNN hidden unit size {256, 512}, text and image embedding size {256, 512}, batch size {32, 64} and dialog context size {2,5,10}. The bracketed numbers indicate the values of each hyperparameter considered.