Alignment for Honesty
Authors: Yuqing Yang, Ethan Chern, Xipeng Qiu, Graham Neubig, Pengfei Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments reveal that these aligned models show a marked increase in honesty, as indicated by our proposed metrics. We open-source all relevant resources to facilitate future research at https://github.com/GAIR-NLP/alignment-for-honesty. |
| Researcher Affiliation | Collaboration | Yuqing Yang3,5 Ethan Chern1,5 Xipeng Qiu3 Graham Neubig4 Pengfei Liu1,2,5 1Shanghai Jiao Tong University 2Shanghai Artificial Intelligence Laboratory 3Fudan University 4Carnegie Mellon University 5Generative AI Research Lab (GAIR) |
| Pseudocode | No | The paper describes methods and illustrates concepts with figures, but it does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | We open-source all relevant resources to facilitate future research at https://github.com/GAIR-NLP/alignment-for-honesty. |
| Open Datasets | Yes | To perform honesty-oriented supervised fine-tuning, we sample 8,000 data from a large-scale knowledge-based questions answering (QA) dataset, Trivia QA (Joshi et al., 2017), as our training dataset, and label contrastive samples as described in 3.2. |
| Dataset Splits | No | The paper mentions training and evaluation (test) datasets but does not explicitly specify a separate validation dataset split with percentages or counts for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | All experiments were conducted using A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Co LLi E' for full parameter fine-tuning and the 'Adam W optimizer' but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | For model training, we rely on Co LLi E9 (Lv et al., 2023) for full parameter fine-tuning. In particular, we utilized the Adam W optimizer (Loshchilov and Hutter, 2019) with a learning rate of 1e-6 and a weight decay of 0.1. We trained MULTISAMPLE for 1 epoch and other methods for 2 epochs, with a warm-up ratio set to 0.05 and batch size 8. |