FlowQA: Grasping Flow in History for Conversational Machine Comprehension

Authors: Hsin-Yuan Huang, Eunsol Choi, Wen-tau Yih

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental FLOWQA achieves strong empirical results on conversational machine comprehension tasks, and improves the state of the art on various datasets (from 67.8% to 75.0% on Co QA and 60.1% to 64.1% on Qu AC). 4 EXPERIMENTS: CONVERSATIONAL MACHINE COMPREHENSION
Researcher Affiliation Collaboration Hsin-Yuan Huang California Institute of Technology hsinyuan@caltech.edu Eunsol Choi University of Washington eunsol@cs.washington.edu Wen-tau Yih Allen Institute for Artificial Intelligence scottyih@allenai.org
Pseudocode No The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks or figures.
Open Source Code Yes Our code can be found in https://github.com/momohuang/Flow QA.
Open Datasets Yes We experiment with the Qu AC (Choi et al., 2018) and Co QA (Reddy et al., 2019) datasets.
Dataset Splits Yes The decision threshold is tuned on the development set to maximize the F1 score. The development and test set results are reported in Tables 4 and 5.
Hardware Specification No The paper discusses training time and speedup, but it does not specify any particular hardware components like CPU or GPU models, or memory specifications used for the experiments.
Software Dependencies No The paper mentions software like PyTorch and spaCy, but it does not provide specific version numbers for these or any other key software components, which is required for reproducibility.
Experiment Setup Yes We use a maximum of 20 epochs, with each epoch passing through the data once. It roughly takes 10 to 20 epochs to converge. All RNN output size is set to 125, and thus the Bi RNN output would be of size 250. The attention hidden size used in fully-aware attention is set to 250. During training, we use a dropout rate of 0.4 (Srivastava et al., 2014) after the embedding layer (Glo Ve, Co Ve and ELMo) and before applying any linear transformation. The batch size is set to one dialog for Co QA, and three dialog for Qu AC. The optimizer is Adamax (Kingma & Ba, 2015) with a learning rate α = 0.002, β = (0.9, 0.999) and ϵ = 10 8. A fixed random seed is used across all experiments.