MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models
Authors: Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Tianlin Zhang, Sophia Ananiadou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Meta Aligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, and saves up to 93.63% of GPU training hours compared to previous alignment methods. |
| Researcher Affiliation | Collaboration | Kailai Yang1 Zhiwei Liu1 Qianqian Xie2 Jimin Huang2 Tianlin Zhang1 Sophia Ananiadou1 1 The University of Manchester 2 The Fin AI {kailai.yang,zhiwei.liu,sophia.ananiadou}@manchester.ac.uk {xqq.sincere,zhangtianlin668}@gmail.com;jimin@chancefocus.com |
| Pseudocode | Yes | Algorithm 1 Dynamic objectives reformulation. |
| Open Source Code | Yes | This project is open-sourced here. |
| Open Datasets | Yes | We transfer the following three alignment datasets into dynamic multi-objective datasets: (1) HH-RLHF [3]: a large-scale dataset with 160K prompts and corresponding response pairs... (2) Ultra Feedback [7]: a multi-aspect alignment dataset with 64K prompts with preferences obtained from GPT-4... (3) IMHI: we create an alignment dataset on the IMHI dataset [34] targeting interpretable mental health analysis. |
| Dataset Splits | Yes | Train/Val/Test (Dp) 262,719/15,000/15,000 Train/Val (De) 16,502/1,797 |
| Hardware Specification | Yes | GPU Hardware 4 NVIDIA Tesla A100 80GB GPUs |
| Software Dependencies | No | The paper mentions software like 'Huggingface Transformers', 'Fast Chat', and 'Flash Attention 2' but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Optimizer Adam Training Epochs 2 Batch sizes HH-RLHF: 512 / Ultra Feedback: 512 / IMHI: 128 Max token for training Meta Aligner-(1.1B, 7B, 13B): 2048/4096/4096 Learning rate 1e-5 Warm-up ratio 0.05 |