Aligner: Efficient Alignment by Learning to Correct

Authors: Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi (Alex) Qiu, Juntao Dai, Yaodong Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty). Specifically, Aligner-7B has achieved an average improvement of 68.9% in helpfulness and 22.8% in harmlessness across the tested LLMs while also effectively reducing hallucination.
Researcher Affiliation Academia 1Institute for AI, Peking University 2State Key Laboratory of General Artificial Intelligence, Institute for AI, Peking University
Pseudocode Yes Algorithm 1 Aligner Pseudocode
Open Source Code Yes Open Source. We also release all the training codes, Aligner models and 100K Q-A-C dataset, Aligner Tails, to empower the community to explore and advance correction paradigms.
Open Datasets Yes We utilize two open-source preference datasets, HH-RLHF [5] and PKUSafe RLHF [19, 20] as our preference datasets. Considering that the preference pairs in PKUSafe RLHF are generated solely by Alpaca-7B, we additionally construct a 50K preference dataset based on these two preference datasets using correction paradigm.
Dataset Splits No The paper mentions training, evaluation datasets, and various models, but does not explicitly provide the train/validation/test splits by percentage or absolute counts for reproducibility.
Hardware Specification Yes We conducted all training on NVIDIA A800 8 GPUs.
Software Dependencies No The paper mentions using Python and Deep Speed ZeRO-3 but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For detailed training parameters, please see Appendix D. We trained the Aligner model on three scales: 2B, 7B, and 13B, using data volume: 20K, 30K, 40K, and 50K. Throughout the training, we used the Adam W optimizer, setting β1 to 0.9 and β2 to 0.95.