Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

Authors: Cuong Dao, The Hung Tran, Phi Le Nguyen, Truong Thao Nguyen, Nghia Hoang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance.
Researcher Affiliation	Academia	Manh Cuong Dao National University of Singapore EMAIL; The Hung Tran Washington State University EMAIL; Phi Le Nguyen Hanoi University of Science and Technology EMAIL; Thao Nguyen Truong National Institute of Advanced Industrial Science and Technology EMAIL; Trong Nghia Hoang Washington State University EMAIL
Pseudocode	Yes	Algorithm 1 Synthetic Data Generation via Simulating Gaussian Process (GP) Posteriors; Algorithm 2 Learning the Probabilistic Bridge and Simulation for Offline Optimization (ROOT)
Open Source Code	Yes	Our code is publicly available at https://github.com/cuong-dm/ROOT.
Open Datasets	Yes	Our investigation covers four real-world tasks selected from the Design-Bench [59]5 and three RNA-Binding tasks from Vienna RNA [44]. In Design-Bench, the chosen tasks cover both discrete and continuous domains. The discrete tasks, TF-Bind-8 and TF-Bind-10 [3], aim to discover DNA sequences with high binding affinity to a specific transcription factor (SIX6 REF R1).
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits for the primary offline dataset used. It mentions using a 'static set of its observed input-output pairs' or 'offline data'. For evaluation, generated candidates are assessed at 50th, 80th, and 100th percentiles, but this is an evaluation metric, not a dataset split. A specific 'few-shot' ablation uses a 1% labeled/99% unlabeled split of the offline data, but this is not the general experimental setup.
Hardware Specification	Yes	All our experiments were conducted on a system with the following specifications: Ubuntu 20.04.5, a single NVIDIA A100-SXM4-80GB GPU, and CUDA 10.1.
Software Dependencies	No	The paper mentions 'Ubuntu 20.04.5' and 'CUDA 10.1' as part of the system specifications, and 'Adam optimizer' for training. However, it does not specify versions for key software libraries or frameworks such as Python, PyTorch, or TensorFlow, which are essential for reproducing the experimental setup.
Experiment Setup	Yes	For GP kernel hyper-parameters in our data generation, we sample lengthscales ℓs and variances σ2 s uniformly from [ℓ0 δ, ℓ0 + δ] and [σ2 0 δ, σ2 0 + δ], with ℓ0 = σ2 0 = 1.0 for continuous tasks and 6.25 for discrete tasks, and δ = 0.25. We use M = 100 gradient steps with step sizes 0.001 (continuous) and 0.05 (discrete). For training the Probabilistic Bridge model, we use a Brownian Bridge diffusion process with the Adam optimizer over E = 100 epochs and ng = 800 synthetic functions, running on a single NVIDIA A100-SXM4-80GB GPU. The MLP is trained using the Adam optimizer for 100 epochs with a learning rate of 0.001. During each epoch, we sample ne = 8 synthetic functions from the Gaussian process (the total number of synthetic functions is ng = ne E = 8 100 = 800 functions) and generate np = 1024 samples for each function. At the testing phase, we sample high-value design candidates from the 128 best designs in the offline dataset, using T = 200 sequential denoising steps.