Perplexity-aware Correction for Robust Alignment with Noisy Preferences

Authors: Keyi Kong, Xilie Xu, Di Wang, Jingfeng ZHANG, Mohan S. Kankanhalli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments validate that our proposed Perp Correct can achieve state-of-the-art alignment performance under NPs.
Researcher Affiliation Academia Keyi Kong1, Xilie Xu2, Di Wang3 Jingfeng Zhang4,5, Mohan Kankanhalli2 1Shandong University 2National University of Singapore 3King Abdullah University of Science and Technology 4The University of Auckland 5RIKEN Center for Advanced Intelligence Project (AIP)
Pseudocode Yes The algorithm of Perp Correct is described in Algorithm 2. ... Algorithm 1 Robust Alignment via Perplexity-aware Correction (Perp Correct) ... Algorithm 2 Perplexity-aware Correction (Perp Correct)
Open Source Code Yes Our code is available at Perp Correct. ... Our training code are open-source on Git Hub.
Open Datasets Yes We utilize two preference datasets, namely Open Assistant Conversations (OASST1) [17] and Golden HH [7].
Dataset Splits Yes The processed OASST1 dataset comprises 17,939 training samples and 951 testing samples and the processed Golden HH dataset consists of 12,066 training samples and 654 testing samples. ... Table 5 illustrates the impact of the number of clean validation data points.
Hardware Specification Yes We utilized the Qlora method [11] for fine-tuning the LLMs, executed on RTX 4090 GPUs with 24 GB of memory. ... Each experiment, involving a specific method and proportion of NPs, could be completed using a single RTX 4090 GPU within 24 hours on the Golden HH dataset and within 72 hours on the OASST1 dataset.
Software Dependencies No The paper mentions using 'transformers and TRL libraries' and 'Adam W optimizer' but does not specify their version numbers, which is required for a reproducible description of software dependencies.
Experiment Setup Yes Hyperparameters were set as follows: lora_rank = 32, lora_dropout = 0.1, and lora_alpha = 16. For SFT, we use the alpaca dataset [30] and set learning_rate = 2e 4 and batch_size = 20. For our Perp Correct stage II, we set β = 0.1, learning_rate = 1e 3, batch_size = 4, T = 5, and α = 0.02. For our Perp Correct stage III and all other alignment methods, we set β = 0.1, learning_rate = 3e 4, and batch_size = 20.