Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Direct Alignment with Heterogeneous Preferences
Authors: Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar, Parsa Mirtaheri, Rediet Abebe, Ariel D Procaccia
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7 Experiments We provide empirical evidence for our claims throughout the paper. We first extend our sensitivity example in Sec. 4.3 to a real-world preference dataset in Fig. 4. Using 19 Pew surveys, we find that NBC rankings change under shifts from uniform sampling in 20% of cases. Notably, these changes require only modest shifts: In half the cases, a total variation distance of less than 0.23 from uniform is enough to alter the rankings. We defer the details to Appendix B. In Sec. 7.1, we simulate DPO and our proposed improvements in a small-scale environment where we can visualize and compare the resulting policies. Finally, we scale this experiment in Sec. 7.2 by fine-tuning large language models, illustrating the extent of possible improvement over DPO. |
| Researcher Affiliation | Collaboration | Ali Shirali UC Berkeley Arash Nasr-Esfahany* MIT Abdullah Alomar MIT & Ikigai Labs Parsa Mirtaheri UC San Diego Rediet Abebe ELLIS Institute, MPI for Intelligent Systems, & Tübingen AI Center Ariel Procaccia Harvard University |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any structured blocks of text formatted like pseudocode or an algorithm. |
| Open Source Code | Yes | Code for reproducing all results is publicly available at https://github.com/arashtne/dahp. |
| Open Datasets | Yes | Specifically, we use Lo RA [47] to fine-tune Llama-3-8B [48] for both reward learning and direct alignment on two relabeled variations of the HH-RLHF dataset [49]. ... We use several Pew Research Center surveys, specifically the American Trends Panel surveys number 35, 52, 79, 83, 99, 109, 111, 112, 114, 119, 120, 121, 126, 127, 128, 129, 130, 131, and 132. ... [52] Pew Research Center. About pew research center, 2025. URL https://www.pewresearch.org/about/. Accessed: 2025-01-20. |
| Dataset Splits | Yes | We filter for data points in which the sum of the number of tokens in the prompt and the number of tokens in the longer response do not exceed 512. This leaves us with 160, 800 training and 17, 104 test data points. |
| Hardware Specification | No | We only do small-scale experiments running on a single GPU in a few days. |
| Software Dependencies | No | While the paper mentions using specific software components like Lo RA [47], Llama-3-8B [48], and the Adam optimizer, it does not provide specific version numbers for these components or the underlying programming languages/frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We fine-tune Llama-3-8B [48] base model with Lo RA [47]. We fine-tune for one epoch with a batch size of 2, and use a linear learning rate schedule that starts with 3 10 5 and decreases to zero. We use the Adam optimizer with a weight decay of 0.001 [53]. Regarding Lo RA s hyper-parameters, we use the matrix rank of r = 8, α = 32, and the dropout probability of 0.1. |