An Empirical Study of Content Understanding in Conversational Question Answering

Authors: Ting-Rui Chiang, Hao-Tong Ye, Yun-Nung Chen7578-7585

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results indicate some potential hazards in the benchmark datasets, Qu AC and Co QA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models.
Researcher Affiliation Academia Ting-Rui Chiang, Hao-Tong Ye, Yun-Nung Chen National Taiwan University, Taipei, Taiwan {r07922052, r08922065}@csie.ntu.edu.tw, y.v.chen@ieee.org
Pseudocode No The paper describes the models and experimental settings in text, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/Miu Lab/CQA-Study.
Open Datasets Yes There are two benchmark conversational question answering datasets, Qu AC (Choi et al. 2018) and Co QA (Reddy, Chen, and Manning 2019).
Dataset Splits Yes Model performance on the validation set of Qu AC and Co QA.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions utilizing existing models such as Flow QA, BERT, and SDNet, but it does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) required for replication.
Experiment Setup Yes Each models of each setting are trained with 3 different random seeds, and the resulted mean and standard deviation value are reported for reliability.