Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Causally Motivated Sycophancy Mitigation for Large Language Models
Authors: Haoxi Li, Xueyang Tang, Jie ZHANG, Song Guo, Sikai Bai, Peiran Dong, Yue Yu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted across diverse language tasks to demonstrate the superiority of our method over state-of-the-art competitors in mitigating sycophancy in LLMs. |
| Researcher Affiliation | Collaboration | 1The Hong Kong University of Science and Technology 2The Hong Kong Polytechnic University 3Peng Cheng Laboratory |
| Pseudocode | No | The paper describes its methodology using structured causal models and mathematical equations (e.g., objective functions), but it does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository. The text only refers to supplementary materials for detailed descriptions of datasets and baselines, but not code. |
| Open Datasets | Yes | Our primary evaluation suite is Sycophancy Eval, which extends existing assessments by incorporating realistic, open-ended text-generation tasks. This suite is based on the work of (Sharma et al., 2024) and includes subsets of six QA datasets: (i) MMLU (Hendrycks et al., 2020); (ii) MATH (Hendrycks et al., 2021); (iii) AQu A (Ling et al., 2017); (iv) Truthful QA (Lin et al., 2021); (v) Trivia QA (Joshi et al., 2017); and (vi) Poem (Sharma et al., 2024). |
| Dataset Splits | Yes | Specifically, we split Truthful QA into halves: one for development (split 4:1 for training and validation) and the other for testing. |
| Hardware Specification | Yes | In addition, all experiments are implemented on four NVIDIA Geforce A100 GPUs. |
| Software Dependencies | No | The paper mentions "Lang Chain library" in section A.2.3 but does not provide a specific version number for it or any other software dependency. |
| Experiment Setup | Yes | We perform three training epochs (2 : 1) alternately to update intervention prompts and heads weight matrix, and set their learning rates to 1e 5 and 2e 3, respectively. The total number of epochs is 40. ... We sweep two hyperparameters, K and λ, controlling the strength of calibration, using 5% of randomly sampled questions from Truthful QA for training and validation. The optimal hyperparameters are K = 48 and λ = 0.1. |