reproducibilityindex.ai

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

Authors: Hsin-Yuan Huang, Eunsol Choi, Wen-tau Yih

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	FLOWQA achieves strong empirical results on conversational machine comprehension tasks, and improves the state of the art on various datasets (from 67.8% to 75.0% on Co QA and 60.1% to 64.1% on Qu AC). 4 EXPERIMENTS: CONVERSATIONAL MACHINE COMPREHENSION
Researcher Affiliation	Collaboration	Hsin-Yuan Huang California Institute of Technology hsinyuan@caltech.edu Eunsol Choi University of Washington eunsol@cs.washington.edu Wen-tau Yih Allen Institute for Artiﬁcial Intelligence scottyih@allenai.org
Pseudocode	No	The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks or figures.
Open Source Code	Yes	Our code can be found in https://github.com/momohuang/Flow QA.
Open Datasets	Yes	We experiment with the Qu AC (Choi et al., 2018) and Co QA (Reddy et al., 2019) datasets.
Dataset Splits	Yes	The decision threshold is tuned on the development set to maximize the F1 score. The development and test set results are reported in Tables 4 and 5.
Hardware Specification	No	The paper discusses training time and speedup, but it does not specify any particular hardware components like CPU or GPU models, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions software like PyTorch and spaCy, but it does not provide specific version numbers for these or any other key software components, which is required for reproducibility.
Experiment Setup	Yes	We use a maximum of 20 epochs, with each epoch passing through the data once. It roughly takes 10 to 20 epochs to converge. All RNN output size is set to 125, and thus the Bi RNN output would be of size 250. The attention hidden size used in fully-aware attention is set to 250. During training, we use a dropout rate of 0.4 (Srivastava et al., 2014) after the embedding layer (Glo Ve, Co Ve and ELMo) and before applying any linear transformation. The batch size is set to one dialog for Co QA, and three dialog for Qu AC. The optimizer is Adamax (Kingma & Ba, 2015) with a learning rate α = 0.002, β = (0.9, 0.999) and ϵ = 10 8. A ﬁxed random seed is used across all experiments.