Aligning Language Models with Preferences through $f$-divergence Minimization

Authors: Dongyoung Go, Tomasz Korbak, Germàn Kruszewski, Jos Rozen, Nahyeon Ryu, Marc Dymetman

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to investigate the effect of model size on our approach using the scalar preference task described in Sec. 4.1. Specifically, we gradually increase the model size from GPT-2 small (117M parameters) to xl (1.5B parameters) while tracking two important metrics: alignment score, which is measured by the expected reward Eπθ[ϕ(x)], and diversity, which is measured by the entropy. Figure 6 demonstrates that the alignment score steadily improves as the model size increases.
Researcher Affiliation Collaboration Dongyoung Go 1 2 Tomasz Korbak 3 Germ an Kruszewski 4 Jos Rozen 4 Nahyeon Ryu 1 Marc Dymetman 5 1Naver Corp 2Yonsei University 3University of Sussex 4Naver Labs Europe 5Independent Researcher.
Pseudocode Yes Algorithm 1 f-DPG
Open Source Code No The paper states that models were implemented using PyTorch and Hugging Face Transformers and that pretrained models are available on Huggingface Model Hub. However, it does not explicitly provide a link or statement confirming the release of their specific f-DPG implementation code.
Open Datasets Yes Unless specified otherwise, we use a pretrained GPT-2 small (Radford et al., 2019) with 117M parameters for the initial model. We sample source documents from the the CNN/Daily Mail dataset (Nallapati et al., 2016). For the first experiment, we use GPT-2 small as the initial model a, additionally fine-tuned on the Wiki Bio dataset (Lebret et al., 2016). We condition on Python function signatures in the Python150 dataset (Raychev et al., 2016). We set πθ as a GPT-2 with 117M parameters model fine-tuned on the IMDB dataset (Maas et al., 2011).
Dataset Splits No The paper mentions 'disjoint train/test sets' for some tasks but does not provide specific percentages, absolute counts, or detailed methodology for how the training, validation, and testing splits were created, which is necessary for reproducibility.
Hardware Specification Yes Training was performed on Nvidia V100 GPU, with the longest run taking approximately 2 days.
Software Dependencies No The paper states: 'All models were implemented using Py Torch (Paszke et al., 2019) and Hugging Face Transformers (Wolf et al., 2020) with the Adam optimizer (Kingma & Ba, 2015).' While it names the software, it does not specify exact version numbers for PyTorch or Hugging Face Transformers (e.g., PyTorch 1.9, Transformers 4.10), which are crucial for reproducibility.
Experiment Setup Yes Table 2 lists 'Experiment Hyperparameters' including 'Common batch size = 258, optimizer = Adam, learning rate schedule = constant with warmup (100 epochs)'. Specific learning rates, maximum lengths, and total epochs are also provided for individual experiments within the table.