FusionNet: Fusing via Fully-aware Attention with Application to Machine Comprehension
Authors: Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, Weizhu Chen
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply Fusion Net to the Stanford Question Answering Dataset (SQu AD) and it achieves the first position for both single and ensemble model on the official SQu AD leaderboard at the time of writing (Oct. 4th, 2017). Meanwhile, we verify the generalization of Fusion Net with two adversarial SQu AD datasets and it sets up the new state-of-the-art on both datasets: on Add Sent, Fusion Net increases the best F1 metric from 46.6% to 51.4%; on Add One Sent, Fusion Net boosts the best F1 metric from 56.0% to 60.7%. |
| Researcher Affiliation | Collaboration | Hsin-Yuan Huang*1,2, Chenguang Zhu1, Yelong Shen1, Weizhu Chen1 1Microsoft Business AI and Research 2National Taiwan University momohuang@gmail.com, {chezhu,yeshen,wzchen}@microsoft.com |
| Pseudocode | No | The paper describes the architecture and processes using diagrams and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | An open-source implementation of Fusion Net can be found at https://github.com/momohuang/Fusion Net-NLI. |
| Open Datasets | Yes | We focus on the SQu AD dataset (Rajpurkar et al., 2016) to train and evaluate our model. SQu AD is a popular machine comprehension dataset consisting of 100,000+ questions created by crowd workers on 536 Wikipedia articles. |
| Dataset Splits | Yes | We focus on the SQu AD dataset (Rajpurkar et al., 2016) to train and evaluate our model. |
| Hardware Specification | Yes | On a single NVIDIA Ge Force GTX Titan X GPU, each epoch took roughly 20 minutes when batch size 32 is used. |
| Software Dependencies | No | The paper mentions software like "Py Torch" and "spa Cy" but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Detailed experimental settings can be found in Appendix E. In Appendix E, it states: "The batch size is set to 32, and the optimizer is Adamax (Kingma & Ba, 2014) with a learning rate α = 0.002, β = (0.9, 0.999) and ϵ = 10 8. A fixed random seed is used across all experiments. During training, we use a dropout rate of 0.4 (Srivastava et al., 2014) after the embedding layer (Glo Ve and Co Ve) and before applying any linear transformation." |