reproducibilityindex.ai

LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification

Authors: Jiangjie Chen, Qiaoben Bao, Changzhi Sun, Xinbo Zhang, Jiaze Chen, Hao Zhou, Yanghua Xiao, Lei Li10482-10491

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a public fact veriﬁcation benchmark show that LOREN is competitive against previous approaches while enjoying the merit of faithful and accurate interpretability. We evaluate our veriﬁcation method on a large-scale fact veriﬁcation benchmark, i.e., FEVER 1.0 shared task (Thorne et al. 2018)
Researcher Affiliation	Collaboration	Jiangjie Chen1,2*, Qiaoben Bao1, Changzhi Sun2, Xinbo Zhang2, Jiaze Chen2, Hao Zhou2, Yanghua Xiao1,4 , Lei Li3 1Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University 2Byte Dance AI Lab 3University of California, Santa Barbara 4Fudan-Aishu Cognitive Intelligence Joint Research Center
Pseudocode	No	The paper does not include any sections or blocks explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured code-like procedural steps.
Open Source Code	Yes	The resources of LOREN are available at: https://github.com/jiangjiechen/LOREN.
Open Datasets	Yes	We evaluate our veriﬁcation method on a large-scale fact veriﬁcation benchmark, i.e., FEVER 1.0 shared task (Thorne et al. 2018), which is split into training, development and blind test set. The statistical report of FEVER dataset is presented in Table 1, with the split sizes of SUPPORTED (SUP), REFUTED (REF) and NOT ENOUGH INFO (NEI) classes.
Dataset Splits	Yes	The statistical report of FEVER dataset is presented in Table 1, with the split sizes of SUPPORTED (SUP), REFUTED (REF) and NOT ENOUGH INFO (NEI) classes. (Table 1 shows 'Training', 'Development', 'Test' splits with specific counts).
Hardware Specification	Yes	The models are trained on 4 NVIDIA Tesla V100 GPUs for 5 hours for best performance on development set.
Software Dependencies	No	The paper mentions software by name and cites related papers (e.g., 'Hugging Face s implementation (Wolf et al. 2020)', 'BARTbase model (Lewis et al. 2020)', 'DeBERTa (He et al. 2021)'). However, it does not provide specific version numbers for these software components or libraries, which is necessary for reproducibility.
Experiment Setup	Yes	During data preprocessing, we set the maximum lengths of xglobal and x(i) local as 256 and 128 tokens respectively, and set the maximum number of phrases per claim as 8. During training, we set the initial learning rate of LOREN with BERT and Ro BERTa as 2e-5 and 1e5, and batch size as 16 and 8 respectively. The models are trained on 4 NVIDIA Tesla V100 GPUs for 5 hours for best performance on development set. We train the model for 4 epochs with initial learning rate of 5e-5, and use the checkpoint with the best ROUGE-2 score on the development set.