LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification
Authors: Jiangjie Chen, Qiaoben Bao, Changzhi Sun, Xinbo Zhang, Jiaze Chen, Hao Zhou, Yanghua Xiao, Lei Li10482-10491
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a public fact verification benchmark show that LOREN is competitive against previous approaches while enjoying the merit of faithful and accurate interpretability. We evaluate our verification method on a large-scale fact verification benchmark, i.e., FEVER 1.0 shared task (Thorne et al. 2018) |
| Researcher Affiliation | Collaboration | Jiangjie Chen1,2*, Qiaoben Bao1, Changzhi Sun2, Xinbo Zhang2, Jiaze Chen2, Hao Zhou2, Yanghua Xiao1,4 , Lei Li3 1Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University 2Byte Dance AI Lab 3University of California, Santa Barbara 4Fudan-Aishu Cognitive Intelligence Joint Research Center |
| Pseudocode | No | The paper does not include any sections or blocks explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured code-like procedural steps. |
| Open Source Code | Yes | The resources of LOREN are available at: https://github.com/jiangjiechen/LOREN. |
| Open Datasets | Yes | We evaluate our verification method on a large-scale fact verification benchmark, i.e., FEVER 1.0 shared task (Thorne et al. 2018), which is split into training, development and blind test set. The statistical report of FEVER dataset is presented in Table 1, with the split sizes of SUPPORTED (SUP), REFUTED (REF) and NOT ENOUGH INFO (NEI) classes. |
| Dataset Splits | Yes | The statistical report of FEVER dataset is presented in Table 1, with the split sizes of SUPPORTED (SUP), REFUTED (REF) and NOT ENOUGH INFO (NEI) classes. (Table 1 shows 'Training', 'Development', 'Test' splits with specific counts). |
| Hardware Specification | Yes | The models are trained on 4 NVIDIA Tesla V100 GPUs for 5 hours for best performance on development set. |
| Software Dependencies | No | The paper mentions software by name and cites related papers (e.g., 'Hugging Face s implementation (Wolf et al. 2020)', 'BARTbase model (Lewis et al. 2020)', 'DeBERTa (He et al. 2021)'). However, it does not provide specific version numbers for these software components or libraries, which is necessary for reproducibility. |
| Experiment Setup | Yes | During data preprocessing, we set the maximum lengths of xglobal and x(i) local as 256 and 128 tokens respectively, and set the maximum number of phrases per claim as 8. During training, we set the initial learning rate of LOREN with BERT and Ro BERTa as 2e-5 and 1e5, and batch size as 16 and 8 respectively. The models are trained on 4 NVIDIA Tesla V100 GPUs for 5 hours for best performance on development set. We train the model for 4 epochs with initial learning rate of 5e-5, and use the checkpoint with the best ROUGE-2 score on the development set. |