Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Aligner: Efficient Alignment by Learning to Correct
Authors: Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi (Alex) Qiu, Juntao Dai, Yaodong Yang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty). Specifically, Aligner-7B has achieved an average improvement of 68.9% in helpfulness and 22.8% in harmlessness across the tested LLMs while also effectively reducing hallucination. |
| Researcher Affiliation | Academia | 1Institute for AI, Peking University 2State Key Laboratory of General Artificial Intelligence, Institute for AI, Peking University |
| Pseudocode | Yes | Algorithm 1 Aligner Pseudocode |
| Open Source Code | Yes | Open Source. We also release all the training codes, Aligner models and 100K Q-A-C dataset, Aligner Tails, to empower the community to explore and advance correction paradigms. |
| Open Datasets | Yes | We utilize two open-source preference datasets, HH-RLHF [5] and PKUSafe RLHF [19, 20] as our preference datasets. Considering that the preference pairs in PKUSafe RLHF are generated solely by Alpaca-7B, we additionally construct a 50K preference dataset based on these two preference datasets using correction paradigm. |
| Dataset Splits | No | The paper mentions training, evaluation datasets, and various models, but does not explicitly provide the train/validation/test splits by percentage or absolute counts for reproducibility. |
| Hardware Specification | Yes | We conducted all training on NVIDIA A800 8 GPUs. |
| Software Dependencies | No | The paper mentions using Python and Deep Speed ZeRO-3 but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For detailed training parameters, please see Appendix D. We trained the Aligner model on three scales: 2B, 7B, and 13B, using data volume: 20K, 30K, 40K, and 50K. Throughout the training, we used the Adam W optimizer, setting β1 to 0.9 and β2 to 0.95. |