reproducibilityindex.ai

Exploring Implicit Feedback for Open Domain Conversation Generation

Authors: Wei-Nan Zhang, Lingzhi Li, Dongyan Cao, Ting Liu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed approach outperforms the Seq2Seq model and the state-of-the-art reinforcement learning model for conversation generation on automatic and human evaluations on the Open Subtitles and Twitter datasets.
Researcher Affiliation	Academia	Wei-Nan Zhang, Lingzhi Li, Dongyan Cao, Ting Liu Research Center for Social Computing and Information Retrieval School of Computer Science and Technology Harbin Institute of Technology, Harbin, China {wnzhang, lzli, dycao, tliu}@ir.hit.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the model and processes in text and mathematical equations.
Open Source Code	No	The paper does not provide a link or explicit statement about open-sourcing the code for their methodology. It provides a link to a 'Twitter conversation corpus' dataset.
Open Datasets	Yes	The ﬁrst is the Open Subtitles dataset (Tiedemann 2009), which is also used in (Vinyals and Le 2015; Li et al. 2016c). The second dataset is a Twitter conversation corpus3 that contains 754,530 messages. 44 million and 376,265 conversation pairs from the two datasets are used for training the Seq2Seq models. (...) 3https://github.com/Marsan-Ma/chat corpus
Dataset Splits	No	The paper mentions data used for 'training' and 'initializing the conversation simulations' and 'test simulation' but does not explicitly specify a 'validation' split or how it was used.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions 'Seq2Seq model' and 'RNN-LSTM model' and 'policy gradient' but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup	Yes	The training epochs are equals to 50 and 124 on the Open Subtitles and Twitter datasets, respectively. The batch size is set to 128. The maximum length of an input sequence is set to 60 words. The number of hidden state of Seq2Seq model equals to 128. The size of the vocabulary for Seq2Seq model is set to 100,000. The beam size is set to 10 for decoding. For the training and test of the simulation, the numbers of the simulated conversation turns are set to 5 and 8, respectively. The λ1, λ2 and λ3 in Equation (5) equals to 0.4,0.4 and 0.2. δ in Equation (6) equals to 0.5.