reproducibilityindex.ai

Alignment for Honesty

Authors: Yuqing Yang, Ethan Chern, Xipeng Qiu, Graham Neubig, Pengfei Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments reveal that these aligned models show a marked increase in honesty, as indicated by our proposed metrics. We open-source all relevant resources to facilitate future research at https://github.com/GAIR-NLP/alignment-for-honesty.
Researcher Affiliation	Collaboration	Yuqing Yang3,5 Ethan Chern1,5 Xipeng Qiu3 Graham Neubig4 Pengfei Liu1,2,5 1Shanghai Jiao Tong University 2Shanghai Artificial Intelligence Laboratory 3Fudan University 4Carnegie Mellon University 5Generative AI Research Lab (GAIR)
Pseudocode	No	The paper describes methods and illustrates concepts with figures, but it does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	We open-source all relevant resources to facilitate future research at https://github.com/GAIR-NLP/alignment-for-honesty.
Open Datasets	Yes	To perform honesty-oriented supervised fine-tuning, we sample 8,000 data from a large-scale knowledge-based questions answering (QA) dataset, Trivia QA (Joshi et al., 2017), as our training dataset, and label contrastive samples as described in 3.2.
Dataset Splits	No	The paper mentions training and evaluation (test) datasets but does not explicitly specify a separate validation dataset split with percentages or counts for hyperparameter tuning or model selection.
Hardware Specification	Yes	All experiments were conducted using A100 GPUs.
Software Dependencies	No	The paper mentions using 'Co LLi E' for full parameter fine-tuning and the 'Adam W optimizer' but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	For model training, we rely on Co LLi E9 (Lv et al., 2023) for full parameter fine-tuning. In particular, we utilized the Adam W optimizer (Loshchilov and Hutter, 2019) with a learning rate of 1e-6 and a weight decay of 0.1. We trained MULTISAMPLE for 1 epoch and other methods for 2 epochs, with a warm-up ratio set to 0.05 and batch size 8.