reproducibilityindex.ai

Understanding the Learning Dynamics of Alignment with Human Feedback

Authors: Shawn Im, Yixuan Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our work provides an initial attempt to theoretically analyze the learning dynamics of human preference alignment. We formally show how the distribution of preference datasets influences the rate of model updates and provide rigorous guarantees on the training accuracy. Our theory also reveals an intricate phenomenon where the optimization is prone to prioritizing certain behaviors with higher preference distinguishability. We empirically validate our findings on contemporary LLMs and alignment tasks, reinforcing our theoretical insights and shedding light on considerations for future alignment approaches.
Researcher Affiliation	Academia	1Department of Computer Sciences, University of Wisconsin-Madison. Correspondence to: Shawn Im <shawnim@cs.wisc.edu>, Yixuan Li <sharonli@cs.wisc.edu>.
Pseudocode	No	The paper includes mathematical formulations and descriptions of algorithms (RLHF, DPO) but does not present any structured pseudocode or algorithm blocks labeled as such.
Open Source Code	No	Furthermore, we are committed to enhancing reproducibility and broader applicability by releasing our code publicly which is available here.
Open Datasets	Yes	For training, we leverage Anthropic s Persona dataset (Perez et al., 2022), which encompasses diverse types of personas1. 1https://github.com/anthropics/evals/ tree/main/persona We verify that this behavior occurs in practice by using the HH-RLHF dataset (Bai et al., 2022a) in Appendix C
Dataset Splits	No	The paper mentions training and testing, and loss curves for both, but it does not explicitly provide details about the dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	No	The paper mentions the use of 'Llama-2-7B model' and 'Mistral-7B model' for experiments, but it does not specify any particular hardware components such as GPU models, CPU types, or memory used.
Software Dependencies	No	The paper mentions using the 'Adam W optimizer' and 'LoRA' configuration parameters (r=8, α=32, 0.05 dropout), but it does not list any specific software libraries (e.g., PyTorch, TensorFlow) or their version numbers.
Experiment Setup	Yes	All of the following experiments are conducted with full fine-tuning on the Llama-2-7B model with the Adam W optimizer (Loshchilov & Hutter, 2018). The learning rate is 1e-5, and β = 0.01. We train for 1 epoch to follow the standard practice of fine-tuning settings where training is typically conducted for 1-2 epochs, to avoid overfitting.