MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Authors: Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Tianlin Zhang, Sophia Ananiadou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that Meta Aligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, and saves up to 93.63% of GPU training hours compared to previous alignment methods.
Researcher Affiliation Collaboration Kailai Yang1 Zhiwei Liu1 Qianqian Xie2 Jimin Huang2 Tianlin Zhang1 Sophia Ananiadou1 1 The University of Manchester 2 The Fin AI {kailai.yang,zhiwei.liu,sophia.ananiadou}@manchester.ac.uk {xqq.sincere,zhangtianlin668}@gmail.com;jimin@chancefocus.com
Pseudocode Yes Algorithm 1 Dynamic objectives reformulation.
Open Source Code Yes This project is open-sourced here.
Open Datasets Yes We transfer the following three alignment datasets into dynamic multi-objective datasets: (1) HH-RLHF [3]: a large-scale dataset with 160K prompts and corresponding response pairs... (2) Ultra Feedback [7]: a multi-aspect alignment dataset with 64K prompts with preferences obtained from GPT-4... (3) IMHI: we create an alignment dataset on the IMHI dataset [34] targeting interpretable mental health analysis.
Dataset Splits Yes Train/Val/Test (Dp) 262,719/15,000/15,000 Train/Val (De) 16,502/1,797
Hardware Specification Yes GPU Hardware 4 NVIDIA Tesla A100 80GB GPUs
Software Dependencies No The paper mentions software like 'Huggingface Transformers', 'Fast Chat', and 'Flash Attention 2' but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Optimizer Adam Training Epochs 2 Batch sizes HH-RLHF: 512 / Ultra Feedback: 512 / IMHI: 128 Max token for training Meta Aligner-(1.1B, 7B, 13B): 2048/4096/4096 Learning rate 1e-5 Warm-up ratio 0.05