Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Preference-Guided Diffusion for Multi-Objective Offline Optimization

Authors: Yashas Annadani, Syrine Belakaria, Stefano Ermon, Stefan Bauer, Barbara Engelhardt

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on various continuous offline multi-objective optimization tasks and find that it consistently outperforms other inverse/generative approaches while remaining competitive with forward/ surrogate-based optimization methods.
Researcher Affiliation	Academia	Yashas Annadani 1,3 Syrine Belakaria2 Stefano Ermon2 Stefan Bauer1,3 Barbara Engelhardt2,4 1 TU Munich 2 Stanford University 3 Helmholtz AI, Munich 4 Gladstone Institutes
Pseudocode	Yes	Algorithm 1 Sampling from Preference Guided Diffusion
Open Source Code	Yes	Correspondence to EMAIL. Code available at https://github.com/yannadani/pgd_moo.
Open Datasets	Yes	Our evaluation closely follows the benchmarking effort provided in prior work [45]. We evaluate our approach on two sets of tasks: synthetic and real-world applications-based RE engineering suite [40]. Each task consists of a dataset of 60k offline datapoints. As in [45], we use 54k randomly chosen data points for training and the remaining for validation.
Dataset Splits	Yes	Each task consists of a dataset of 60k offline datapoints. As in [45], we use 54k randomly chosen data points for training and the remaining for validation.
Hardware Specification	Yes	All the experiments are run on an NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions using Adam W optimizer [29] and Adam optimizer [20], and describes network architectures, but does not specify software library versions (e.g., PyTorch 1.x, TensorFlow 2.x, or specific Python versions).
Experiment Setup	Yes	We parameterize the unconditional denoising model to be a multi-layer perceptron (MLP) with two 512-dimensional hidden layers, followed by a Re LU nonlinearity and layer normalization [26]. We also incorporate sinusoidal time embedding [43] for conditioning. We parameterize the preference model to be an MLP with three hidden layers, with first two hidden layers having the same number of units as the input, while the last hidden layer is having 512 units. Similar to denoising model, we also use Re LU nonlinearity followed by layer normalization and sinusoidal time embedding. The denoising model is trained with Adam W optimizer [29] with learning rate of 5e 4 for up to 200 epochs. Following Ho et al. [17], we employ a linear noise schedule such that the noise βt grows linearly from 1e 4 to 0.02. The preference model is trained with Adam optimizer [20] with learning rate of 1e 5 for up to 500 epochs. During sampling, we set the guidance weight w to 10.