Aligning Language Models with Human Preferences via a Bayesian Approach
Authors: Jiashuo WANG, Haozhao Wang, Shichao Sun, Wenjie Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity Rule-of-Thumb generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations. |
| Researcher Affiliation | Academia | 1Department of Computing, The Hong Kong Polytechnic University 2School of Computer Science and Technology, Huazhong University of Science and Technology |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes are released at https://github.com/wangjs9/Aligned-d PM. |
| Open Datasets | Yes | Dataset and Base Models The benchmark ESConv [22], containing approximately 1k conversations with 31k utterances... We derive human preferences from the Motivational Interviewing-Dataset [36]... The MIC dataset [42] comprises about 99k distinct Ro Ts... |
| Dataset Splits | Yes | The dataset was randomly split into a 9 : 1 ratio for the training and validation set. |
| Hardware Specification | Yes | We trained models based on Multi ESC using two Nvidia RTX 3092 GPUs, while all other models, including d-PM models, were trained using a single NVIDIA RTX 3092 GPU. |
| Software Dependencies | Yes | Our models were implemented in Python using Py Torch3 and the transformers (4.16.2) library4. |
| Experiment Setup | Yes | When training the aligned models, we aim to retain the same hyperparameters used in the training of the base models. We set the candidate number K to 10. We train each aligned model five times with five different seeds. Subsequently, we test each of the five trained models on the test dataset and compute the average results. ... We set the learning rate to 1 10 3 for the Blender-Vanilla and Blender-Joint base models, and 3 10 5 for the other models. Additionally, due to GPU memory constraints, we reduced the batch size from 32 to 12 when training the aligned Multi ESC. ... The prefix length was set to 10. A batch size of 160 and a learning rate of 5 10 4 were used. |