Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reverse Engineering Human Preferences with Reinforcement Learning

Authors: Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos, Maximilian Mozes, Marek Rei, Max Bartolo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments 4.1 Models and Hyperparameters 4.2 Datasets 4.3 Baselines 4.4 Results
Researcher Affiliation Collaboration Lisa Alazraki Imperial College London Tan Yi-Chern Cohere Jon Ander Campos Cohere Maximilian Mozes Cohere Marek Rei Imperial College London Max Bartolo Cohere Work done while at Cohere. Correspondence to EMAIL.
Pseudocode No The paper describes the RL problem and loss function mathematically and illustrates a pipeline in Figure 1, but it does not present structured pseudocode or an algorithm block.
Open Source Code No Answer: [No] Justification: All experiments were run on a proprietary framework specifically designed for fast training and inference of large models. Nevertheless, we provide exhaustive implementation details in Section 3 and Appendix B for full reproducibility in the framework of choice. All datasets used are open access with data locations specified in Appendix C.1. ... Note that, while we openly disclose our training algorithm and hyperparameters and train using publicly available data, we do not release our trained preamble generator checkpoints to the public, as this may encourage their misuse.
Open Datasets Yes We test all pipelines on MT-Bench [40], which consists of 160 open-ended questions... Since MT-Bench does not comprise a training set, we fine-tune and validate the preamble generators using questions from Ultra Feedback [8]... Table 8: Details of training and testing datasets. Dataset ID Split Used License Ultra Feedback openbmb/Ultra Feedback Train MIT MT-Bench Hugging Face H4/mt_bench_prompts Test Apache-2.0
Dataset Splits Yes We test all pipelines on MT-Bench [40]... Since MT-Bench does not comprise a training set, we fine-tune and validate the preamble generators using questions from Ultra Feedback [8]... In Table 8, we provide the Hugging Face IDs, data splits, and licenses of the open-access datasets used for training and inference. For all datasets, we use the official splits provided.
Hardware Specification Yes We tune the Command R7B [7] preamble generators on a Google Cloud TPU v5e containing 64 chips. We train the Llama 3.1 8B Instruct preamble generator on a single Nvidia H100 GPU.
Software Dependencies No The paper mentions LLMs like Command R and Llama 3.1 and optimizers like Adam, but it does not specify software library versions (e.g., PyTorch 1.9, TensorFlow 2.x) that would be needed to replicate the experiments.
Experiment Setup Yes For all three pipelines, we train with a batch size of 64, two gradient steps per batch, and a learning rate of 1e-6 using the Adam optimiser. To sample preambles from the generator, we set t = 4.0, k = 1.0, and p = 1.0, using a high temperature to ensure sufficient diversity among preambles conditioned on the same question. At inference, the sampling temperature is reduced to t = 0.5. Each preamble is limited to a maximum length of 512 tokens. The loss function hyperparameter β is tuned to a relatively small value (β = 0.03)... All hyperparameters, API model IDs, and training process details are given in Appendix B.