Aligning Language Models with Preferences through $f$-divergence Minimization
Authors: Dongyoung Go, Tomasz Korbak, Germàn Kruszewski, Jos Rozen, Nahyeon Ryu, Marc Dymetman
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to investigate the effect of model size on our approach using the scalar preference task described in Sec. 4.1. Specifically, we gradually increase the model size from GPT-2 small (117M parameters) to xl (1.5B parameters) while tracking two important metrics: alignment score, which is measured by the expected reward Eπθ[ϕ(x)], and diversity, which is measured by the entropy. Figure 6 demonstrates that the alignment score steadily improves as the model size increases. |
| Researcher Affiliation | Collaboration | Dongyoung Go 1 2 Tomasz Korbak 3 Germ an Kruszewski 4 Jos Rozen 4 Nahyeon Ryu 1 Marc Dymetman 5 1Naver Corp 2Yonsei University 3University of Sussex 4Naver Labs Europe 5Independent Researcher. |
| Pseudocode | Yes | Algorithm 1 f-DPG |
| Open Source Code | No | The paper states that models were implemented using PyTorch and Hugging Face Transformers and that pretrained models are available on Huggingface Model Hub. However, it does not explicitly provide a link or statement confirming the release of their specific f-DPG implementation code. |
| Open Datasets | Yes | Unless specified otherwise, we use a pretrained GPT-2 small (Radford et al., 2019) with 117M parameters for the initial model. We sample source documents from the the CNN/Daily Mail dataset (Nallapati et al., 2016). For the first experiment, we use GPT-2 small as the initial model a, additionally fine-tuned on the Wiki Bio dataset (Lebret et al., 2016). We condition on Python function signatures in the Python150 dataset (Raychev et al., 2016). We set πθ as a GPT-2 with 117M parameters model fine-tuned on the IMDB dataset (Maas et al., 2011). |
| Dataset Splits | No | The paper mentions 'disjoint train/test sets' for some tasks but does not provide specific percentages, absolute counts, or detailed methodology for how the training, validation, and testing splits were created, which is necessary for reproducibility. |
| Hardware Specification | Yes | Training was performed on Nvidia V100 GPU, with the longest run taking approximately 2 days. |
| Software Dependencies | No | The paper states: 'All models were implemented using Py Torch (Paszke et al., 2019) and Hugging Face Transformers (Wolf et al., 2020) with the Adam optimizer (Kingma & Ba, 2015).' While it names the software, it does not specify exact version numbers for PyTorch or Hugging Face Transformers (e.g., PyTorch 1.9, Transformers 4.10), which are crucial for reproducibility. |
| Experiment Setup | Yes | Table 2 lists 'Experiment Hyperparameters' including 'Common batch size = 258, optimizer = Adam, learning rate schedule = constant with warmup (100 epochs)'. Specific learning rates, maximum lengths, and total epochs are also provided for individual experiments within the table. |