reproducibilityindex.ai

Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

Authors: Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines.
Researcher Affiliation	Collaboration	1Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology 2Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL) 3Department of Computer Science and Technology, Beijing Institute of Technology 4Ping An Property & Casualty Insurance Company of China, Ltd {fanshixuan, weiw, wendili}@hust.edu.cn, maoxl@bit.edu.cn, julian wind@163.com, chendangyang273@pingan.com.cn
Pseudocode	No	The paper describes its methods using prose and mathematical equations but does not include formal pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	To evaluate the effectiveness of our proposed method, following previous works [Wang et al., 2023; Feng et al., 2023], we conduct experiments on two widely used benchmark datasets, ESConv [Liu et al., 2021] and MSC [Xu et al., 2022a], for long-term dialogue. We use the same data preprocessing and train/valid/test splitting strategy as in [Feng et al., 2023].
Dataset Splits	Yes	We use the same data preprocessing and train/valid/test splitting strategy as in [Feng et al., 2023].
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU models, or memory specifications used for the experiments. It only mentions using Llama2-7B-chat and Qwen-14B-chat, which are models, not hardware.
Software Dependencies	No	The paper mentions using 'Adam optimizer', 'lora', and 'bitsandbytes' but does not provide specific version numbers for these software dependencies (e.g., PyTorch 1.x, bitsandbytes 0.x.x).
Experiment Setup	Yes	Throughout the experiments, we use Adam optimizer [Kingma and Ba, 2015] with 3e-4 initial learning rate and the 128 batch size. All methods are trained for up to 12 epochs. To improve experimental efficiency, we use lora [Hu et al., 2021] with rank 32 to fine-tune large language models. Both training and inference use 4-bit weight quantization by bitsandbytes [Dettmers et al., 2022].