reproducibilityindex.ai

Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention

Authors: Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, Saurabh Tiwary

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments are conducted on Hotpot QA, the multi-hop question answering benchmark Yang et al. (2018), and FEVER, the fact verﬁcation benchmark Thorne et al. (2018).
Researcher Affiliation	Collaboration	Chen Zhao University of Maryland, College Park chenz@cs.umd.edu; Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, and Saurabh Tiwary Microsoft AI & Research cxiong, corosset, xiaso, pauben, satiwary@microsoft.com
Pseudocode	No	The paper describes the Transformer-XH model conceptually and mathematically but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Code available at https://aka.ms/transformer-xh.
Open Datasets	Yes	Our experiments are conducted on Hotpot QA, the multi-hop question answering benchmark Yang et al. (2018), and FEVER, the fact verﬁcation benchmark Thorne et al. (2018).
Dataset Splits	Yes	There are 90k Train, 7k Dev and 7k Test questions.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies	No	The paper mentions using DGL (Wang et al., 2019) and BERT base model (Devlin et al., 2019) but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The in-sequence attention and other standard Transformer components in Transformer-XH are initialized by the pre-trained BERT base model (Devlin et al., 2019). The extra hop attention parameters are initialized randomly and trained from scratch. The ﬁnal model uses three hop steps... We use DGL (Wang et al., 2019) for implementing Transformer-XH and Cog QA (w. BERT IR) with batch size 1 (i.e., one graph for each batch), and keep the other parameters same as default BERT setting. We train Transformer-XH separately on two different types of questions... We train Transformer-XH and the GNN of Cog QA (w. BERT IR) for 2 epochs. All other BERT based models use the default BERT parameters and train the model for 1 epoch.