reproducibilityindex.ai

NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation

Authors: Xiaoyang Wang, Chen Li, Jianqiao Zhao, Dong Yu14006-14014

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To facilitate the research on this corpus, we provide results of several benchmark models. Comparative results show that for this dataset, our current models are not able to provide signiﬁcant improvement by introducing background knowledge/topic. Therefore, the proposed dataset should be a good benchmark for further research to evaluate the validity and naturalness of multi-turn conversation systems. We also conduct extensive experiments on this corpus to facilitate future research.
Researcher Affiliation	Industry	Xiaoyang Wang, Chen Li, Jianqiao Zhao, Dong Yu Tencent AI Lab, Bellevue, WA {shawnxywang, ailabchenli, markjzhao, dyu}@tencent.com
Pseudocode	No	The paper describes models and their components (e.g., Seq2Seq, GRU, LSTM, Transformer, BERT) but does not provide any pseudocode or algorithm blocks.
Open Source Code	No	The paper states, "Our dataset is available at https://ai.tencent.com/ailab/nlp/dialogue/#datasets." This link is specifically for the dataset, not the source code for the models or methods described in the paper.
Open Datasets	Yes	Our dataset is available at https://ai.tencent.com/ailab/nlp/dialogue/#datasets.
Dataset Splits	Yes	We split different documents and their corresponding dialogues from the Natural Conv corpus into the train, dev, and test sets, respectively. The total number of documents in different topics, as well as the total number of dialogue pairs for each set are presented in Table 6. The data split will be released together with the corpus.
Hardware Specification	Yes	The experiments are performed on Nvidia Tesla P40 GPUs.
Software Dependencies	No	The paper mentions "Py Torch", "LTP (Che, Li, and Liu 2010) Chinese word segmentation tool", and "bert-base-chinese BERT model released by (Devlin et al. 2018)", but it does not specify version numbers for PyTorch or LTP, which are key for reproducibility.
Experiment Setup	Yes	The Retrieval-BERT model re-ranks the top K = 10 retrieved responses. Our GRU network consists of the one-layer bi-directional GRU encoder and the one-layer GRU decoder. Its embedding size is set to 300, and the hidden state size is set to 800. The LSTM network consists of a two-layer bi-directional LSTM encoder and and a two-layer LSTM decoder. Both the embedding size and the hidden state size of the LSTM model are set to 500. The Transformer model contains a six-layer encoder and a six-layer decoder, with the embedding size, hidden unit size, and attention head number to be 1024, 4096, and 16, respectively. ADAM is used to optimize the GRU, LSTM and Transformer models, with the initial learning rate set to be 5 × 10−5 for GRU, 1 × 10−3 for LSTM, and 5 × 10−4 for Transformer, respectively.