Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Elastic Responding Machine for Dialog Generation with Dynamically Mechanism Selecting

Authors: Ganbin Zhou, Ping Luo, Yijun Xiao, Fen Lin, Bo Chen, Qing He

AAAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments demonstrate the quality and diversity of the generated responses, intuitively show how the learned model controls response mechanism when responding, and reveal some underlying relationship between mechanism and language style.We utilize the dataset in (Zhou et al. 2017) for experiments, which is collected from Tecent Weibo. In total, there are 815, 852 pairs, among which 775, 852 are for training, and 40, 000 for model validation.We summarize the experimental results in Table 1.
Researcher Affiliation	Collaboration	Ganbin Zhou,1,2 Ping Luo,1,2 Yijun Xiao,3 Fen Lin,4 Bo Chen,4 Qing He1,2 1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.EMAIL 2University of Chinese Academy of Sciences, Beijing 100049, China. 3Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, USA. 4We Chat Search Application Department, Tencent, China.
Pseudocode	Yes	Algorithm 1 FILTER(x, S) Input: Post, x Total mechanism set, S Output: Selected mechanisms for input x, Sx
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It does not contain a specific repository link, an explicit code release statement, or indicate code in supplementary materials for its proposed model.
Open Datasets	Yes	We utilize the dataset in (Zhou et al. 2017) for experiments, which is collected from Tecent Weibo1. 1http://t.qq.com/?lang=en US
Dataset Splits	Yes	In total, there are 815, 852 pairs, among which 775, 852 are for training, and 40, 000 for model validation. We stop training after the error over the validation set does not decrease for 7 consecutive epochs. For each model, the parameters with the largest likelihood on validation set are selected for ﬁnal comparison.
Hardware Specification	No	The paper mentions "GPU memory" once in a general context but does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions "We implement models using Theano (Theano Development Team 2016)." and refers to GRU and ADADELTA but does not specify version numbers for these software components. The year for Theano is not a specific version number.
Experiment Setup	Yes	We use a vocabulary of 28,000 Chinese words in coarsegrained segmentation. The dimension of the word embedding is 128, the dimension of hidden state is 1024, and one-layer RNN with GRU (Cho et al. 2014) activation function is utilized. For initialization, parameters are sampled from a uniform distribution between 0.01 and 0.01. For training, ADADELTA (Zeiler 2012; Graves 2013) is used for optimization. We stop training after the error over the validation set does not decrease for 7 consecutive epochs. For generating responses, beam search with beam size 200 is applied.