Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Risk-aware Direct Preference Optimization under Nested Risk Measure

Authors: Lijun Zhang, Lin Li, Yajie Qi, Huizhong Song, Yaodong Yang, Jun Wang, Wei Wei

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval, demonstrate the proposed method s superior performance in balancing alignment performance and model drift.
Researcher Affiliation	Academia	1. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi, China. 2. Institute for AI, Peking University, Beijing, China. 3. University College London.
Pseudocode	Yes	In this subsection, we provide the main pseudocode for Risk-aware Direct Preference Optimization (Ra-DPO), as outlined in Algorithm 1.
Open Source Code	Yes	We trained Ra-DPO and the baseline models based on the original KTO implementation https: //github.com/Contextual AI/HALOs, and our code can be found in the supplemental material.
Open Datasets	Yes	Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval, demonstrate the proposed method s superior performance in balancing alignment performance and model drift. IMDb Dataset [32]: https://huggingface.co/datasets/stanfordnlp/imdb Anthropic HH Dataset [33]: https://huggingface.co/datasets/Anthropic/hh-rlhf Alpaca Eval [34]: https://huggingface.co/datasets/tatsu-lab/alpaca_eval
Dataset Splits	No	The paper mentions using IMDb Dataset, Anthropic HH Dataset, and Alpaca Eval for experiments, but does not explicitly provide training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	Yes	All reported results of our algorithm and baseline algorithms are trained using 4 A100 GPUs, each with 40GB of memory.
Software Dependencies	No	The paper refers to the KTO implementation for parameter settings and lists hyperparameters in Table 2-3, but does not explicitly provide specific software dependencies with version numbers (e.g., Python, PyTorch versions, or other libraries).
Experiment Setup	Yes	In our experiments, we followed the original KTO implementation for the main parameter settings, and both Ra-DPO and the baseline models used the same hyperparameters, as detailed in Table 2-3. Table 2: Hyperparameters in loss functions for different algorithms. Table 3: Hyperparameters in network training. Parameter value max length 512 max prompt length 256 gradient accumulation steps 4 learning rate 5 10 6 optimizer Adam W