FlowQA: Grasping Flow in History for Conversational Machine Comprehension
Authors: Hsin-Yuan Huang, Eunsol Choi, Wen-tau Yih
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | FLOWQA achieves strong empirical results on conversational machine comprehension tasks, and improves the state of the art on various datasets (from 67.8% to 75.0% on Co QA and 60.1% to 64.1% on Qu AC). 4 EXPERIMENTS: CONVERSATIONAL MACHINE COMPREHENSION |
| Researcher Affiliation | Collaboration | Hsin-Yuan Huang California Institute of Technology hsinyuan@caltech.edu Eunsol Choi University of Washington eunsol@cs.washington.edu Wen-tau Yih Allen Institute for Artificial Intelligence scottyih@allenai.org |
| Pseudocode | No | The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks or figures. |
| Open Source Code | Yes | Our code can be found in https://github.com/momohuang/Flow QA. |
| Open Datasets | Yes | We experiment with the Qu AC (Choi et al., 2018) and Co QA (Reddy et al., 2019) datasets. |
| Dataset Splits | Yes | The decision threshold is tuned on the development set to maximize the F1 score. The development and test set results are reported in Tables 4 and 5. |
| Hardware Specification | No | The paper discusses training time and speedup, but it does not specify any particular hardware components like CPU or GPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch and spaCy, but it does not provide specific version numbers for these or any other key software components, which is required for reproducibility. |
| Experiment Setup | Yes | We use a maximum of 20 epochs, with each epoch passing through the data once. It roughly takes 10 to 20 epochs to converge. All RNN output size is set to 125, and thus the Bi RNN output would be of size 250. The attention hidden size used in fully-aware attention is set to 250. During training, we use a dropout rate of 0.4 (Srivastava et al., 2014) after the embedding layer (Glo Ve, Co Ve and ELMo) and before applying any linear transformation. The batch size is set to one dialog for Co QA, and three dialog for Qu AC. The optimizer is Adamax (Kingma & Ba, 2015) with a learning rate α = 0.002, β = (0.9, 0.999) and ϵ = 10 8. A fixed random seed is used across all experiments. |