Language Model Alignment with Elastic Reset
Authors: Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron C. Courville
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLa MA-7B. |
| Researcher Affiliation | Collaboration | Michael Noukhovitch Mila, Université de Montréal Samuel Lavoie Mila, Université de Montréal Florian Strub Google Deepmind Aaron Courville Mila, Université de Montréal CIFAR AI Chair |
| Pseudocode | No | The paper does not contain any pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Code available at github.com/mnoukhov/elastic-reset. |
| Open Datasets | Yes | We first investigate the pivot-translation benchmark of Lee et al. [2019], which was previously popular for small-scale methods countering drift. Two translation models, French to English (FR EN) and English to German (EN DE), are pretrained on IWSLT [Cettolo et al., 2012]. Then, the models are finetuned on translating French to German through English (FR EN DE) but given only paired French and German data from Multi30k [Elliott et al., 2016, 2017] |
| Dataset Splits | No | The paper mentions using a "held-out FR EN validation set" and refers to "validation scores" throughout, but does not provide specific details on the size, percentage, or method of creating these validation splits. |
| Hardware Specification | Yes | All experiments are run with 5 seeds where each run uses a single 16G V100 GPU. ... All experiments are run with 5 seeds where run uses a single 40G A100 GPU. ... RL training is run on 4x 80G A100 GPUs for 20 hours |
| Software Dependencies | No | The paper mentions several software libraries used (e.g., fairseq, Pytorch, RL4LMs, Hugging Face trl, peft, Accelerate, transformers, datasets) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | ELASTIC RESET is implemented on top of REINFORCE with a very minimal KL penalty β = 0.001 and uses an EMA decay η = 0.99. We run all models for 50k updates and reset every 23k steps to get 2 resets / 3 iterations within a run. ... We implement ELASTIC RESET on top of PPO with an EMA decay rate of 0.995 and greatly reduce the KL coefficient β = 0.001 to allow the model to drift more, then reset every 17 epochs such that we get two resets / three iterations during our training. |