Aligner: Efficient Alignment by Learning to Correct
Authors: Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi (Alex) Qiu, Juntao Dai, Yaodong Yang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty). Specifically, Aligner-7B has achieved an average improvement of 68.9% in helpfulness and 22.8% in harmlessness across the tested LLMs while also effectively reducing hallucination. |
| Researcher Affiliation | Academia | 1Institute for AI, Peking University 2State Key Laboratory of General Artificial Intelligence, Institute for AI, Peking University |
| Pseudocode | Yes | Algorithm 1 Aligner Pseudocode |
| Open Source Code | Yes | Open Source. We also release all the training codes, Aligner models and 100K Q-A-C dataset, Aligner Tails, to empower the community to explore and advance correction paradigms. |
| Open Datasets | Yes | We utilize two open-source preference datasets, HH-RLHF [5] and PKUSafe RLHF [19, 20] as our preference datasets. Considering that the preference pairs in PKUSafe RLHF are generated solely by Alpaca-7B, we additionally construct a 50K preference dataset based on these two preference datasets using correction paradigm. |
| Dataset Splits | No | The paper mentions training, evaluation datasets, and various models, but does not explicitly provide the train/validation/test splits by percentage or absolute counts for reproducibility. |
| Hardware Specification | Yes | We conducted all training on NVIDIA A800 8 GPUs. |
| Software Dependencies | No | The paper mentions using Python and Deep Speed ZeRO-3 but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For detailed training parameters, please see Appendix D. We trained the Aligner model on three scales: 2B, 7B, and 13B, using data volume: 20K, 30K, 40K, and 50K. Throughout the training, we used the Adam W optimizer, setting β1 to 0.9 and β2 to 0.95. |