reproducibilityindex.ai

Building Task-Oriented Dialogue Systems for Online Shopping

Authors: Zhao Yan, Nan Duan, Peng Chen, Ming Zhou, Jianshe Zhou, Zhoujun Li

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Interesting and insightful observations are shown in the experimental part, based on the analysis of human-bot conversation log. Several current challenges are also pointed out as our future directions. We evaluate product category detection by two settings. In Off-line setting, we ﬁrst transform each q, urlp pair (6,315,233 in total) to a q, Cp pair... Evaluation results are shown in Table 6...
Researcher Affiliation	Collaboration	Zhao Yan , Nan Duan , Peng Chen , Ming Zhou , Jianshe Zhou and Zhoujun Li State Key Lab of Software Development Environment, Beihang University, Beijing, China Microsoft Research, Beijing, China Microsoft Xiaoice Team, Beijing, China BAICIT, Capital Normal University, Beijing, China {yanzhao, lizj}@buaa.edu.cn {nanduan, peche, mingzhou}@microsoft.com {zhoujs}@cnu.edu.cn
Pseudocode	Yes	Algorithm 1: Intent Phrase Mining... Algorithm 2: Product Attribute Extraction... Algorithm 3: Global Search
Open Source Code	No	The paper does not provide a direct link to a source code repository or an explicit statement about the release of the code for the methodology described in this paper. It mentions using existing resources and methodologies but does not offer the authors' implementation code.
Open Datasets	Yes	We crawl raw questions from Baidu Zhidao1. After ﬁltering the full question set based on the product knowledge base, there are 3,146,063 questions left in QD.
Dataset Splits	Yes	We evaluate product category detection by two settings. In Off-line setting, we ﬁrst transform each q, urlp pair (6,315,233 in total) to a q, Cp pair by ﬁnding p s category from product knowledge base. Next, we split q, Cp pairs into a training set (8/10), a dev set (1/10) and a test set (1/10).
Hardware Specification	No	The paper does not specify any hardware details such as CPU, GPU models, memory, or cloud computing instances used for conducting the experiments.
Software Dependencies	No	The paper describes the use of a 'CNN-based approach' and 'stochastic gradient descent (SGD)' for training but does not provide specific software dependencies like library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	Input Layer. Traditionally, each word after tokenization can be represented by a one-hot word vector, whose dimensionality equals to the word size... We then obtain representation of the tth word-n-gram in an utterance Q by concatenating the character vectors of each word as: lt = [w T t d, ..., w T t , ..., w T t+d]T , where wt denotes the tth word representation, and n = 2d + 1 denotes the contextual window size, which is set to 3. The model is trained by maximizing the likelihood of the correctly associated product categories given training utterances, using stochastic gradient descent (SGD). The minimum support threshold is set to 5 for Frequent Phrase Mining, and the topic size is set to 1,000 for Phrase LDA.