Language Model Alignment with Elastic Reset

Authors: Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron C. Courville

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLa MA-7B.
Researcher Affiliation Collaboration Michael Noukhovitch Mila, Université de Montréal Samuel Lavoie Mila, Université de Montréal Florian Strub Google Deepmind Aaron Courville Mila, Université de Montréal CIFAR AI Chair
Pseudocode No The paper does not contain any pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Code available at github.com/mnoukhov/elastic-reset.
Open Datasets Yes We first investigate the pivot-translation benchmark of Lee et al. [2019], which was previously popular for small-scale methods countering drift. Two translation models, French to English (FR EN) and English to German (EN DE), are pretrained on IWSLT [Cettolo et al., 2012]. Then, the models are finetuned on translating French to German through English (FR EN DE) but given only paired French and German data from Multi30k [Elliott et al., 2016, 2017]
Dataset Splits No The paper mentions using a "held-out FR EN validation set" and refers to "validation scores" throughout, but does not provide specific details on the size, percentage, or method of creating these validation splits.
Hardware Specification Yes All experiments are run with 5 seeds where each run uses a single 16G V100 GPU. ... All experiments are run with 5 seeds where run uses a single 40G A100 GPU. ... RL training is run on 4x 80G A100 GPUs for 20 hours
Software Dependencies No The paper mentions several software libraries used (e.g., fairseq, Pytorch, RL4LMs, Hugging Face trl, peft, Accelerate, transformers, datasets) but does not provide specific version numbers for any of them.
Experiment Setup Yes ELASTIC RESET is implemented on top of REINFORCE with a very minimal KL penalty β = 0.001 and uses an EMA decay η = 0.99. We run all models for 50k updates and reset every 23k steps to get 2 resets / 3 iterations within a run. ... We implement ELASTIC RESET on top of PPO with an EMA decay rate of 0.995 and greatly reduce the KL coefficient β = 0.001 to allow the model to drift more, then reset every 17 epochs such that we get two resets / three iterations during our training.