Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Preference-Guided Diffusion for Multi-Objective Offline Optimization

Authors: Yashas Annadani, Syrine Belakaria, Stefano Ermon, Stefan Bauer, Barbara Engelhardt

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on various continuous offline multi-objective optimization tasks and find that it consistently outperforms other inverse/generative approaches while remaining competitive with forward/ surrogate-based optimization methods.
Researcher Affiliation Academia Yashas Annadani 1,3 Syrine Belakaria2 Stefano Ermon2 Stefan Bauer1,3 Barbara Engelhardt2,4 1 TU Munich 2 Stanford University 3 Helmholtz AI, Munich 4 Gladstone Institutes
Pseudocode Yes Algorithm 1 Sampling from Preference Guided Diffusion
Open Source Code Yes Correspondence to EMAIL. Code available at https://github.com/yannadani/pgd_moo.
Open Datasets Yes Our evaluation closely follows the benchmarking effort provided in prior work [45]. We evaluate our approach on two sets of tasks: synthetic and real-world applications-based RE engineering suite [40]. Each task consists of a dataset of 60k offline datapoints. As in [45], we use 54k randomly chosen data points for training and the remaining for validation.
Dataset Splits Yes Each task consists of a dataset of 60k offline datapoints. As in [45], we use 54k randomly chosen data points for training and the remaining for validation.
Hardware Specification Yes All the experiments are run on an NVIDIA A100 GPU.
Software Dependencies No The paper mentions using Adam W optimizer [29] and Adam optimizer [20], and describes network architectures, but does not specify software library versions (e.g., PyTorch 1.x, TensorFlow 2.x, or specific Python versions).
Experiment Setup Yes We parameterize the unconditional denoising model to be a multi-layer perceptron (MLP) with two 512-dimensional hidden layers, followed by a Re LU nonlinearity and layer normalization [26]. We also incorporate sinusoidal time embedding [43] for conditioning. We parameterize the preference model to be an MLP with three hidden layers, with first two hidden layers having the same number of units as the input, while the last hidden layer is having 512 units. Similar to denoising model, we also use Re LU nonlinearity followed by layer normalization and sinusoidal time embedding. The denoising model is trained with Adam W optimizer [29] with learning rate of 5e 4 for up to 200 epochs. Following Ho et al. [17], we employ a linear noise schedule such that the noise Îēt grows linearly from 1e 4 to 0.02. The preference model is trained with Adam optimizer [20] with learning rate of 1e 5 for up to 500 epochs. During sampling, we set the guidance weight w to 10.