GRATH: Gradual Self-Truthifying for Large Language Models
Authors: Weixin Chen, Dawn Song, Bo Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate GRATH using different 7B-LLMs and compare with LLMs with similar or even larger sizes on benchmark datasets. Our results show that GRATH effectively improves LLMs truthfulness without compromising other core capabilities. Notably, GRATH achieves state-of-the-art performance on Truthful QA, with MC1 accuracy of 54.71% and MC2 accuracy of 69.10%, which even surpass those on 70B-LLMs. |
| Researcher Affiliation | Academia | Weixin Chen 1 Dawn Song 2 Bo Li 1 3 1UIUC 2UC Berkeley 3UChicago. |
| Pseudocode | Yes | Algorithm 1 GRATH |
| Open Source Code | Yes | The code is available at https://github.com/chenweixin107/GRATH. |
| Open Datasets | Yes | When creating pairwise truthfulness training data, we utilize training samples from ARC-Challenge (Clark et al., 2018) as the source of questions, while using six QA primer examples from Truthful QA as few-shot demonstrations which follows the implementation in (Zou et al., 2023). |
| Dataset Splits | No | The paper mentions using training and testing samples from various datasets (ARC-Challenge, Hella Swag, MMLU, Truthful QA) for evaluation. However, it does not provide specific details on how these datasets were split into training, validation, and test sets for the main experiments, or if a separate validation set was used for hyperparameter tuning. While it mentions splitting Truthful QA into 700 training and 117 testing samples for a specific ablation study, this is not presented as the general, reproducible split for all experiments. |
| Hardware Specification | Yes | We adopt the default parameter configurations and implement DPO using one RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions the use of 'Transformer Reinforcement Learning (TRL) library (von Werra et al., 2020)' and 'parameter-efficient technique Lo RA (Hu et al., 2022)'. However, it does not specify the version numbers for these libraries, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | We adopt DPO to fine-tune the model for 1000 steps using the parameter-efficient technique Lo RA (Hu et al., 2022), setting its rank as 8, alpha as 16, and dropout parameter as 0.05. |