Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning
Authors: Zongmeng Zhang, Yufeng Shi, Jinhua Zhu, Wengang Zhou, Xiang Qi, Peng Zhang, Houqiang Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments and analyze TRUSTWORTHY-ALIGNMENT to answer (RQ3) how does the proposed algorithm perform and what are the side effects of aligning retrieval-augmented LLMs to trustworthy status. We first detail the datasets, metrics and compared baselines. Then we analyze the performance, generalization capability, alignment tax and human evaluation of TRUSTWORTHY-ALIGNMENT. Finally, we investigate the impact of individual components of TRUSTWORTHY-ALIGNMENT by carrying out ablation studies on annealing exploration, scaling law and critic model. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive Nation Science Center 3Ant Group. |
| Pseudocode | Yes | Algorithm 1: TRUSTWORTHY-ALIGNMENT Algorithm 2: Collapse Detection |
| Open Source Code | Yes | Our code is available at https://github.com/zmzhang2000/trustworthy-alignment. |
| Open Datasets | Yes | We train TRUSTWORTHY-ALIGNMENT on the MRQA Workshop Shared Task s version3 (Fisch et al., 2019) of Natural Questions (Kwiatkowski et al., 2019). And thus they construct a more coherent and convincing dataset Conflict QA on top of an entitycentric QA dataset Pop QA (Mallen et al., 2023) and a multistep reasoning dataset Strategy QA (Geva et al., 2021) by prompting LLMs to directly generate supporting evidence that contradicts the parametric memory. |
| Dataset Splits | No | The original and substituted version of training set is used for training while only substituted version of dev set is used for evaluation. No specific percentages or counts for splits are provided. |
| Hardware Specification | Yes | Experiments involving Llama-2-13b-chat or larger critic models are conducted on 8 A100-80G GPUs. Other experiments are conducted on 8 3090 GPUs. |
| Software Dependencies | No | We implement TRUSTWORTHY-ALIGNMENT based on the Deep Speed-Chat framework (Yao et al., 2023) with Lo RA (Hu et al., 2021). No specific version numbers for these software components are provided. |
| Experiment Setup | Yes | We list hyperparameters in Table 4. All hyperparameters remain the same in the alignment of all pre-trained models. (Table 4 includes: α0, α1, α2, α3, γ, λ, optimizer, training batch size, max prompt sequence length, actor learning rate, critic learning rate, actor weight decay, critic weight decay, training epochs, learning rate scheduler, warmup steps, actor lora dim, actor lora learning rate, critic lora dim, critic lora learning rate, max new token number, do sample, top-p, number of beams, repetition penalty, initial temperature τ initial, final temperature τ final, temperature decay type). |