Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Preference Learning with Response Time: Robust Losses and Guarantees
Authors: Ayush Sawarni, Sahasrajit Sarmasarkar, Vasilis Syrgkanis
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive set of experiments validate our theoretical findings in the context of preference learning over images. |
| Researcher Affiliation | Academia | Ayush Sawarni Stanford University EMAIL Sahasrajit Sarmasarkar Stanford University EMAIL Vasilis Syrgkanis Stanford University EMAIL |
| Pseudocode | Yes | Meta-Algorithm 1: Estimate Reward Model via Orthogonal Loss |
| Open Source Code | Yes | The experiment code is available in https://github.com/sawarniayush/Preference-Learning-with-Response-Time. |
| Open Datasets | Yes | We evaluate our approach on a real-world text-to-image preference dataset Pick-a-pick [KPS+23], which contains an approx 500k text-to-image dataset generated from several diffusion models. |
| Dataset Splits | Yes | For each training size N, we sample a new network (details in Appendix D) as the true reward model and draw N query pairs X1, X2 uniformly from the unit sphere. ... For each training size N, we draw N random image text pairs for training and an additional 10000 for testing (from the remaining dataset). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for the experiments. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We approximate it with a three-layer neural network... We generate synthetic data from random three-layer neural networks with sigmoid activations in the two hidden layers (widths 64 and 32) and a final linear output layer, fixed input dimension d = 10... learn the nuisance r by minimizing the logistic loss with a three-layer network of widths (10, 32, 16, 1), and learn the t-nuisance by minimizing squared error on T with a three-layer network of widths (20, 32, 16, 1) taking (X1, X2) concatenated as input. ... train a 4-layered feed-forward neural network with hidden layers of sizes 1024, 512, 256 |