Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Preference Learning with Response Time: Robust Losses and Guarantees

Authors: Ayush Sawarni, Sahasrajit Sarmasarkar, Vasilis Syrgkanis

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive set of experiments validate our theoretical findings in the context of preference learning over images.
Researcher Affiliation	Academia	Ayush Sawarni Stanford University EMAIL Sahasrajit Sarmasarkar Stanford University EMAIL Vasilis Syrgkanis Stanford University EMAIL
Pseudocode	Yes	Meta-Algorithm 1: Estimate Reward Model via Orthogonal Loss
Open Source Code	Yes	The experiment code is available in https://github.com/sawarniayush/Preference-Learning-with-Response-Time.
Open Datasets	Yes	We evaluate our approach on a real-world text-to-image preference dataset Pick-a-pick [KPS+23], which contains an approx 500k text-to-image dataset generated from several diffusion models.
Dataset Splits	Yes	For each training size N, we sample a new network (details in Appendix D) as the true reward model and draw N query pairs X1, X2 uniformly from the unit sphere. ... For each training size N, we draw N random image text pairs for training and an additional 10000 for testing (from the remaining dataset).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for the experiments.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup	Yes	We approximate it with a three-layer neural network... We generate synthetic data from random three-layer neural networks with sigmoid activations in the two hidden layers (widths 64 and 32) and a final linear output layer, fixed input dimension d = 10... learn the nuisance r by minimizing the logistic loss with a three-layer network of widths (10, 32, 16, 1), and learn the t-nuisance by minimizing squared error on T with a three-layer network of widths (20, 32, 16, 1) taking (X1, X2) concatenated as input. ... train a 4-layered feed-forward neural network with hidden layers of sizes 1024, 512, 256