reproducibilityindex.ai

Causal Walk: Debiasing Multi-Hop Fact Verification with Front-Door Adjustment

Authors: Congzhi Zhang, Linhai Zhang, Deyu Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that Causal Walk outperforms some previous debiasing methods on both existing datasets and the newly constructed datasets.
Researcher Affiliation	Academia	Congzhi Zhang, Linhai Zhang, Deyu Zhou School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, China {zhangcongzhi, lzhang472, d.zhou}@seu.edu.cn
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data will be released at https://github.com/zcccccz/Causal Walk.
Open Datasets	Yes	We evaluate the model performance on the FEVER dataset and Politi Hop dataset and their variants. For training, all models are trained on the original training set of FEVER and Politi Hop... FEVER (Thorne et al. 2018) and Politi Hop (Ostrowski et al. 2021) respectively. Code and data will be released at https://github.com/zcccccz/Causal Walk.
Dataset Splits	Yes	For training, all models are trained on the original training set of FEVER and Politi Hop. For testing, the developed set of FEVER and the test set of Politi Hop are adopted, denoted as FEVER (Thorne et al. 2018) and Politi Hop (Ostrowski et al. 2021) respectively.
Hardware Specification	No	The paper mentions support from 'Big Data Computing Center of Southeast University' but does not specify any particular hardware components like CPU/GPU models, memory, or specific machine configurations used for experiments.
Software Dependencies	No	The paper mentions using 'BERTbase' and 'Adam optimizer' but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages.
Experiment Setup	Yes	The learning rate is 1e-5. All models are trained for 10 epochs with a batch size of 4. We update the parameters using Adam optimizer. BERT-Concat, CICR, and CLEVER have a maximum input length of 512, and the other models have a maximum input length of 128. The maximum number n of evidence per sample is 20. The beam width w is 3 and the path sampling length m is 5. The number of samples k for each category in the confounder dictionary is 5. The intervention weight parameter α is 0.1.