Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
Authors: Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines. |
| Researcher Affiliation | Collaboration | 1Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology 2Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL) 3Department of Computer Science and Technology, Beijing Institute of Technology 4Ping An Property & Casualty Insurance Company of China, Ltd EMAIL, EMAIL, julian EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | To evaluate the effectiveness of our proposed method, following previous works [Wang et al., 2023; Feng et al., 2023], we conduct experiments on two widely used benchmark datasets, ESConv [Liu et al., 2021] and MSC [Xu et al., 2022a], for long-term dialogue. We use the same data preprocessing and train/valid/test splitting strategy as in [Feng et al., 2023]. |
| Dataset Splits | Yes | We use the same data preprocessing and train/valid/test splitting strategy as in [Feng et al., 2023]. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU models, or memory specifications used for the experiments. It only mentions using Llama2-7B-chat and Qwen-14B-chat, which are models, not hardware. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer', 'lora', and 'bitsandbytes' but does not provide specific version numbers for these software dependencies (e.g., PyTorch 1.x, bitsandbytes 0.x.x). |
| Experiment Setup | Yes | Throughout the experiments, we use Adam optimizer [Kingma and Ba, 2015] with 3e-4 initial learning rate and the 128 batch size. All methods are trained for up to 12 epochs. To improve experimental efficiency, we use lora [Hu et al., 2021] with rank 32 to fine-tune large language models. Both training and inference use 4-bit weight quantization by bitsandbytes [Dettmers et al., 2022]. |