RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

Authors: Chongyang Tao, Lili Mou, Dongyan Zhao, Rui Yan

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both retrieval and generative dialog systems show that RUBER has a high correlation with human annotation, and that RUBER has fair transferability over different datasets.
Researcher Affiliation Academia 1Institute of Computer Science and Technology, Peking University, China 2David R. Cheriton School of Computer Science, University of Waterloo 3Beijing Institute of Big Data Research, China {chongyangtao,zhaody,ruiyan}@pku.edu.cn doublepower.mou@gmail.com
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No We crawled massive data from an online Chinese forum Douban. The training set contains 1,449,218 samples, each of which consists of a query-reply pair.
Dataset Splits No The paper mentions a training set and a set of 300 samples for human evaluation, but it does not provide specific training/validation/test dataset splits for model training or evaluation in the conventional sense (e.g., percentages or counts for a distinct validation set).
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models or types of computing resources used for the experiments.
Software Dependencies No The paper mentions 'word2vec embeddings' and 'Adam' optimizer but does not specify their version numbers or the versions of any other software libraries or frameworks used.
Experiment Setup Yes In the referenced metric, we trained 50dimensional word2vec embeddings on the Douban dataset. For the unreferenced metric, the dimension of RNN layers was set to 500. The training objective is to minimize J = max 0, Δ s U(q, r) + s U(q, r ) (3) We train model parameters with Adam (Kingma and Ba 2015) with backpropagation. ... margin Δ (set to 0.05 by validation).