reproducibilityindex.ai

GRATH: Gradual Self-Truthifying for Large Language Models

Authors: Weixin Chen, Dawn Song, Bo Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate GRATH using different 7B-LLMs and compare with LLMs with similar or even larger sizes on benchmark datasets. Our results show that GRATH effectively improves LLMs truthfulness without compromising other core capabilities. Notably, GRATH achieves state-of-the-art performance on Truthful QA, with MC1 accuracy of 54.71% and MC2 accuracy of 69.10%, which even surpass those on 70B-LLMs.
Researcher Affiliation	Academia	Weixin Chen 1 Dawn Song 2 Bo Li 1 3 1UIUC 2UC Berkeley 3UChicago.
Pseudocode	Yes	Algorithm 1 GRATH
Open Source Code	Yes	The code is available at https://github.com/chenweixin107/GRATH.
Open Datasets	Yes	When creating pairwise truthfulness training data, we utilize training samples from ARC-Challenge (Clark et al., 2018) as the source of questions, while using six QA primer examples from Truthful QA as few-shot demonstrations which follows the implementation in (Zou et al., 2023).
Dataset Splits	No	The paper mentions using training and testing samples from various datasets (ARC-Challenge, Hella Swag, MMLU, Truthful QA) for evaluation. However, it does not provide specific details on how these datasets were split into training, validation, and test sets for the main experiments, or if a separate validation set was used for hyperparameter tuning. While it mentions splitting Truthful QA into 700 training and 117 testing samples for a specific ablation study, this is not presented as the general, reproducible split for all experiments.
Hardware Specification	Yes	We adopt the default parameter configurations and implement DPO using one RTX A6000 GPU.
Software Dependencies	No	The paper mentions the use of 'Transformer Reinforcement Learning (TRL) library (von Werra et al., 2020)' and 'parameter-efficient technique Lo RA (Hu et al., 2022)'. However, it does not specify the version numbers for these libraries, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	We adopt DPO to fine-tune the model for 1000 steps using the parameter-efficient technique Lo RA (Hu et al., 2022), setting its rank as 8, alpha as 16, and dropout parameter as 0.05.