Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law

Authors: Shawn Im, Sharon Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper introduces a new theoretical framework to analyze how generalization scales with value diversity and sample quantity in models trained with direct preference optimization. Our framework rigorously assesses how well models generalize after a finite number of gradient steps, reflecting realworld LLM training practices. By analyzing the reward margin associated with each sample and its trajectory throughout training, we provide a bound on the generalization error that demonstrates the challenges of effectively learning a wide set of concepts or values. These insights are empirically validated on contemporary LLMs, underscoring the practical relevance of our theory.
Researcher Affiliation	Academia	Shawn Im Sharon Li Department of Computer Sciences University of Wisconsin-Madison EMAIL
Pseudocode	No	The paper does not contain explicit pseudocode or algorithm blocks. The methods are described through mathematical formulations and textual descriptions.
Open Source Code	Yes	5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have open-sourced the code here.
Open Datasets	Yes	To ground our theoretical analysis, we begin with a concrete example from the Anthropic s persona dataset [40], which encompasses diverse types of human values.
Dataset Splits	Yes	For each persona, we randomly sample a subset of 90% of the statements for training, and use the remaining 10% for testing.
Hardware Specification	Yes	Software and hardware. We train with 4 A100 80GB GPUs using the TRL library [104] and Huggingface library [105] for full fine-tuning, generate embeddings with the Huggingface library and 1 A100 80GB GPU, and perform last-layer training on 1 A100 80GB GPU.
Software Dependencies	No	The paper mentions 'TRL library [104]' and 'Huggingface library [105]' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Training setup. For all full fine-tuning training runs, we use the Adam W optimizer with a learning rate of 10 5 for Llama models and 10 6.5 for Qwen and Mistral with no warm-up steps and a constant learning rate. We train on 4 GPUs with a batch size of 32 per device. For last-layer training runs, we use the Adam optimizer with a learning rate of 1e-3. For all experiments, we use β = 0.01.