Causal Walk: Debiasing Multi-Hop Fact Verification with Front-Door Adjustment

Authors: Congzhi Zhang, Linhai Zhang, Deyu Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that Causal Walk outperforms some previous debiasing methods on both existing datasets and the newly constructed datasets.
Researcher Affiliation Academia Congzhi Zhang*, Linhai Zhang*, Deyu Zhou School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, China {zhangcongzhi, lzhang472, d.zhou}@seu.edu.cn
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and data will be released at https://github.com/zcccccz/Causal Walk.
Open Datasets Yes We evaluate the model performance on the FEVER dataset and Politi Hop dataset and their variants. For training, all models are trained on the original training set of FEVER and Politi Hop... FEVER (Thorne et al. 2018) and Politi Hop (Ostrowski et al. 2021) respectively. Code and data will be released at https://github.com/zcccccz/Causal Walk.
Dataset Splits Yes For training, all models are trained on the original training set of FEVER and Politi Hop. For testing, the developed set of FEVER and the test set of Politi Hop are adopted, denoted as FEVER (Thorne et al. 2018) and Politi Hop (Ostrowski et al. 2021) respectively.
Hardware Specification No The paper mentions support from 'Big Data Computing Center of Southeast University' but does not specify any particular hardware components like CPU/GPU models, memory, or specific machine configurations used for experiments.
Software Dependencies No The paper mentions using 'BERTbase' and 'Adam optimizer' but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages.
Experiment Setup Yes The learning rate is 1e-5. All models are trained for 10 epochs with a batch size of 4. We update the parameters using Adam optimizer. BERT-Concat, CICR, and CLEVER have a maximum input length of 512, and the other models have a maximum input length of 128. The maximum number n of evidence per sample is 20. The beam width w is 3 and the path sampling length m is 5. The number of samples k for each category in the confounder dictionary is 5. The intervention weight parameter α is 0.1.