reproducibilityindex.ai

An Empirical Study of Content Understanding in Conversational Question Answering

Authors: Ting-Rui Chiang, Hao-Tong Ye, Yun-Nung Chen7578-7585

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results indicate some potential hazards in the benchmark datasets, Qu AC and Co QA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models.
Researcher Affiliation	Academia	Ting-Rui Chiang, Hao-Tong Ye, Yun-Nung Chen National Taiwan University, Taipei, Taiwan {r07922052, r08922065}@csie.ntu.edu.tw, y.v.chen@ieee.org
Pseudocode	No	The paper describes the models and experimental settings in text, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/Miu Lab/CQA-Study.
Open Datasets	Yes	There are two benchmark conversational question answering datasets, Qu AC (Choi et al. 2018) and Co QA (Reddy, Chen, and Manning 2019).
Dataset Splits	Yes	Model performance on the validation set of Qu AC and Co QA.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions utilizing existing models such as Flow QA, BERT, and SDNet, but it does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) required for replication.
Experiment Setup	Yes	Each models of each setting are trained with 3 different random seeds, and the resulted mean and standard deviation value are reported for reliability.