RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
Authors: Chongyang Tao, Lili Mou, Dongyan Zhao, Rui Yan
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both retrieval and generative dialog systems show that RUBER has a high correlation with human annotation, and that RUBER has fair transferability over different datasets. |
| Researcher Affiliation | Academia | 1Institute of Computer Science and Technology, Peking University, China 2David R. Cheriton School of Computer Science, University of Waterloo 3Beijing Institute of Big Data Research, China {chongyangtao,zhaody,ruiyan}@pku.edu.cn doublepower.mou@gmail.com |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | No | We crawled massive data from an online Chinese forum Douban. The training set contains 1,449,218 samples, each of which consists of a query-reply pair. |
| Dataset Splits | No | The paper mentions a training set and a set of 300 samples for human evaluation, but it does not provide specific training/validation/test dataset splits for model training or evaluation in the conventional sense (e.g., percentages or counts for a distinct validation set). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models or types of computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions 'word2vec embeddings' and 'Adam' optimizer but does not specify their version numbers or the versions of any other software libraries or frameworks used. |
| Experiment Setup | Yes | In the referenced metric, we trained 50dimensional word2vec embeddings on the Douban dataset. For the unreferenced metric, the dimension of RNN layers was set to 500. The training objective is to minimize J = max 0, Δ s U(q, r) + s U(q, r ) (3) We train model parameters with Adam (Kingma and Ba 2015) with backpropagation. ... margin Δ (set to 0.05 by validation). |