Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
Authors: Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing HONG, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-Yan Yeung, Lifeng Shang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results reveal that our method outperforms existing alignment approaches in enhancing model safety while maintaining the overall utility. |
| Researcher Affiliation | Collaboration | Kai Chen1 , Chunwei Wang2 , Kuo Yang2, Jianhua Han2, Lanqing Hong2 , Fei Mi2 , Hang Xu2, Zhengying Liu2, Wenyong Huang2, Zhenguo Li2, Dit-Yan Yeung1, Lifeng Shang2, Xin Jiang2, Qun Liu2 1Hong Kong University of Science and Technology 2Huawei Noah s Ark Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | The PKU-Safe RLHF dataset (Dai et al., 2023) is used for both training and evaluation. [...] our training set is further augmented with an additional 52k helpful instructions from Taori et al. (2023). |
| Dataset Splits | No | The paper states using PKU-Safe RLHF for training and evaluation, and mentions specific test sets from Alpaca Farm and PKU-Safe RLHF, but does not explicitly provide training/validation/test dataset splits or details about a distinct validation split size/percentage. |
| Hardware Specification | Yes | Acknowledgement. We gratefully acknowledge the support of Mind Spore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research. |
| Software Dependencies | No | The paper mentions deploying 'Lo RA (Hu et al., 2021)' and using 'Mind Spore' but does not provide specific version numbers for software dependencies or programming environments. |
| Experiment Setup | Yes | Furthermore, we deploy Lo RA (Hu et al., 2021) by default in all Transformer linear layers, setting the rank to 16. To ensure a fair comparison, all methods under evaluation are fine-tuned for three epochs. [...] Each method undergoes fine-tuning for one epoch, beginning with an initial learning rate of 0.0001. |