Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Risk-aware Direct Preference Optimization under Nested Risk Measure

Authors: Lijun Zhang, Lin Li, Yajie Qi, Huizhong Song, Yaodong Yang, Jun Wang, Wei Wei

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval, demonstrate the proposed method s superior performance in balancing alignment performance and model drift.
Researcher Affiliation Academia 1. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi, China. 2. Institute for AI, Peking University, Beijing, China. 3. University College London.
Pseudocode Yes In this subsection, we provide the main pseudocode for Risk-aware Direct Preference Optimization (Ra-DPO), as outlined in Algorithm 1.
Open Source Code Yes We trained Ra-DPO and the baseline models based on the original KTO implementation https: //github.com/Contextual AI/HALOs, and our code can be found in the supplemental material.
Open Datasets Yes Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval, demonstrate the proposed method s superior performance in balancing alignment performance and model drift. IMDb Dataset [32]: https://huggingface.co/datasets/stanfordnlp/imdb Anthropic HH Dataset [33]: https://huggingface.co/datasets/Anthropic/hh-rlhf Alpaca Eval [34]: https://huggingface.co/datasets/tatsu-lab/alpaca_eval
Dataset Splits No The paper mentions using IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval for experiments, but does not explicitly provide training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification Yes All reported results of our algorithm and baseline algorithms are trained using 4 A100 GPUs, each with 40GB of memory.
Software Dependencies No The paper refers to the KTO implementation for parameter settings and lists hyperparameters in Table 2-3, but does not explicitly provide specific software dependencies with version numbers (e.g., Python, PyTorch versions, or other libraries).
Experiment Setup Yes In our experiments, we followed the original KTO implementation for the main parameter settings, and both Ra-DPO and the baseline models used the same hyperparameters, as detailed in Table 2-3. Table 2: Hyperparameters in loss functions for different algorithms. Table 3: Hyperparameters in network training. Parameter value max length 512 max prompt length 256 gradient accumulation steps 4 learning rate 5 10 6 optimizer Adam W