Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention

Authors: Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, Saurabh Tiwary

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments are conducted on Hotpot QA, the multi-hop question answering benchmark Yang et al. (2018), and FEVER, the fact verfication benchmark Thorne et al. (2018).
Researcher Affiliation Collaboration Chen Zhao University of Maryland, College Park chenz@cs.umd.edu; Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, and Saurabh Tiwary Microsoft AI & Research cxiong, corosset, xiaso, pauben, satiwary@microsoft.com
Pseudocode No The paper describes the Transformer-XH model conceptually and mathematically but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes Code available at https://aka.ms/transformer-xh.
Open Datasets Yes Our experiments are conducted on Hotpot QA, the multi-hop question answering benchmark Yang et al. (2018), and FEVER, the fact verfication benchmark Thorne et al. (2018).
Dataset Splits Yes There are 90k Train, 7k Dev and 7k Test questions.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies No The paper mentions using DGL (Wang et al., 2019) and BERT base model (Devlin et al., 2019) but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes The in-sequence attention and other standard Transformer components in Transformer-XH are initialized by the pre-trained BERT base model (Devlin et al., 2019). The extra hop attention parameters are initialized randomly and trained from scratch. The final model uses three hop steps... We use DGL (Wang et al., 2019) for implementing Transformer-XH and Cog QA (w. BERT IR) with batch size 1 (i.e., one graph for each batch), and keep the other parameters same as default BERT setting. We train Transformer-XH separately on two different types of questions... We train Transformer-XH and the GNN of Cog QA (w. BERT IR) for 2 epochs. All other BERT based models use the default BERT parameters and train the model for 1 epoch.