Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention
Authors: Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, Saurabh Tiwary
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments are conducted on Hotpot QA, the multi-hop question answering benchmark Yang et al. (2018), and FEVER, the fact verfication benchmark Thorne et al. (2018). |
| Researcher Affiliation | Collaboration | Chen Zhao University of Maryland, College Park chenz@cs.umd.edu; Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, and Saurabh Tiwary Microsoft AI & Research cxiong, corosset, xiaso, pauben, satiwary@microsoft.com |
| Pseudocode | No | The paper describes the Transformer-XH model conceptually and mathematically but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Code available at https://aka.ms/transformer-xh. |
| Open Datasets | Yes | Our experiments are conducted on Hotpot QA, the multi-hop question answering benchmark Yang et al. (2018), and FEVER, the fact verfication benchmark Thorne et al. (2018). |
| Dataset Splits | Yes | There are 90k Train, 7k Dev and 7k Test questions. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper mentions using DGL (Wang et al., 2019) and BERT base model (Devlin et al., 2019) but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | The in-sequence attention and other standard Transformer components in Transformer-XH are initialized by the pre-trained BERT base model (Devlin et al., 2019). The extra hop attention parameters are initialized randomly and trained from scratch. The final model uses three hop steps... We use DGL (Wang et al., 2019) for implementing Transformer-XH and Cog QA (w. BERT IR) with batch size 1 (i.e., one graph for each batch), and keep the other parameters same as default BERT setting. We train Transformer-XH separately on two different types of questions... We train Transformer-XH and the GNN of Cog QA (w. BERT IR) for 2 epochs. All other BERT based models use the default BERT parameters and train the model for 1 epoch. |