Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Risk-aware Direct Preference Optimization under Nested Risk Measure
Authors: Lijun Zhang, Lin Li, Yajie Qi, Huizhong Song, Yaodong Yang, Jun Wang, Wei Wei
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval, demonstrate the proposed method s superior performance in balancing alignment performance and model drift. |
| Researcher Affiliation | Academia | 1. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi, China. 2. Institute for AI, Peking University, Beijing, China. 3. University College London. |
| Pseudocode | Yes | In this subsection, we provide the main pseudocode for Risk-aware Direct Preference Optimization (Ra-DPO), as outlined in Algorithm 1. |
| Open Source Code | Yes | We trained Ra-DPO and the baseline models based on the original KTO implementation https: //github.com/Contextual AI/HALOs, and our code can be found in the supplemental material. |
| Open Datasets | Yes | Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval, demonstrate the proposed method s superior performance in balancing alignment performance and model drift. IMDb Dataset [32]: https://huggingface.co/datasets/stanfordnlp/imdb Anthropic HH Dataset [33]: https://huggingface.co/datasets/Anthropic/hh-rlhf Alpaca Eval [34]: https://huggingface.co/datasets/tatsu-lab/alpaca_eval |
| Dataset Splits | No | The paper mentions using IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval for experiments, but does not explicitly provide training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | Yes | All reported results of our algorithm and baseline algorithms are trained using 4 A100 GPUs, each with 40GB of memory. |
| Software Dependencies | No | The paper refers to the KTO implementation for parameter settings and lists hyperparameters in Table 2-3, but does not explicitly provide specific software dependencies with version numbers (e.g., Python, PyTorch versions, or other libraries). |
| Experiment Setup | Yes | In our experiments, we followed the original KTO implementation for the main parameter settings, and both Ra-DPO and the baseline models used the same hyperparameters, as detailed in Table 2-3. Table 2: Hyperparameters in loss functions for different algorithms. Table 3: Hyperparameters in network training. Parameter value max length 512 max prompt length 256 gradient accumulation steps 4 learning rate 5 10 6 optimizer Adam W |