reproducibilityindex.ai

Multi-Hop Fact Checking of Political Claims

Authors: Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We: 1) construct a small annotated dataset, Politi Hop1, of evidence sentences for claim veriﬁcation; 2) compare it to existing multi-hop datasets; and 3) study how to transfer knowledge from more extensive inand out-of-domain resources to Politi Hop. We ﬁnd that the task is complex and achieve the best performance with an architecture that speciﬁcally models reasoning over evidence pieces in combination with in-domain transfer learning.
Researcher Affiliation	Academia	Wojciech Ostrowski , Arnav Arora , Pepa Atanasova and Isabelle Augenstein Department of Computer Science, University of Copenhagen, Denmark qnj566@alumni.ku.dk, {aar, pepa, augenstein}@di.ku.dk
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper. The methodology is described in narrative text.
Open Source Code	Yes	We make the Politi Hop dataset and the code for the experiments publicly available on https://github.com/copenlu/politihop .
Open Datasets	Yes	We make the Politi Hop dataset and the code for the experiments publicly available on https://github.com/copenlu/politihop .
Dataset Splits	Yes	It consists of 500 manually annotated claims in written English, split into a training (300 instances) and a test set (200 instances). ... We split the training data into train and dev datasets, where the former has 592 examples and the latter 141.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., specific GPU or CPU models, memory sizes). It mentions models like BERT and Transformer-XH but not the computational resources.
Software Dependencies	No	The paper mentions using BERT [Devlin et al., 2019] and Transformer-XH [Zhao et al., 2020] but does not specify version numbers for these or other software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	In our experiments, we set k = 6 since this is the average number of evidence sentences selected by a single annotator. ... We use three e Xtra hop layers as in [Zhao et al., 2020], which corresponds to three-hop reasoning, and we experiment with varying the number of hops.