reproducibilityindex.ai

Dialogues Are Not Just Text: Modeling Cognition for Dialogue Coherence Evaluation

Authors: Xue Li, Jia Su, Yang Yang, Zipeng Gao, Xinyu Duan, Yi Guan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate the necessity of modeling human cognition for dialogue evaluation, and our DCGEval presents stronger correlations with human judgments compared to other state-of-the-art evaluation metrics.
Researcher Affiliation	Collaboration	Xue Li1*, Jia Su2, Yang Yang1 , Zipeng Gao3, Xinyu Duan2, Yi Guan1 1Faculty of Computing, Harbin Institute of Technology 2Huawei Cloud 3School of Computer Science and Technology, University of Science and Technology of China
Pseudocode	No	The paper contains mathematical formulations and descriptions of processes, but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper references a third-party AMR parser ('1https://github.com/bjascob/amrlib.') but does not provide any concrete access to the source code for the methodology described in this paper by the authors themselves.
Open Datasets	Yes	We use two daily dialogue datasets, Daily Dialog++ (Sai et al. 2020) and Daily Dialog EVAL (Huang et al. 2020), as training data. To evaluate model performance, we use Conv AI2 (Huang et al. 2020) and Empathetic Dialogues (Huang et al. 2020) as unseen datasets, including substantial human scoring.
Dataset Splits	No	The paper mentions using 'Daily Dialog++' and 'Daily Dialog EVAL' as training data, and 'Conv AI2' and 'Empathetic Dialogues' as unseen evaluation datasets. However, it does not provide specific dataset split information (e.g., exact percentages, sample counts, or explicit predefined splits for training, validation, and testing within these datasets) needed to reproduce the data partitioning.
Hardware Specification	No	The paper describes the experimental setup and results but does not specify any particular hardware used for running the experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions using certain software components like an 'AMR parser' and 'Transformer' models, but it does not provide specific version numbers for these or other ancillary software dependencies, such as 'Python 3.8' or 'PyTorch 1.9'.
Experiment Setup	No	The paper describes the overall framework, training objectives (MLR loss, KD-MSE loss), and architectural components (GCN, Transformer, MLP). However, it does not explicitly provide specific hyperparameter values such as learning rate, batch size, number of epochs, or optimizer settings for training their model within the main text.