Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MOBO-OSD: Batch Multi-Objective Bayesian Optimization via Orthogonal Search Directions

Authors: Lam Ngo, Huong Ha, Jeffrey Chan, Hongyu Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments and analysis on a variety of synthetic and real-world benchmark functions with two to six objectives, we demonstrate that MOBO-OSD consistently outperforms the state-of-the-art algorithms.
Researcher Affiliation	Academia	Lam Ngo School of Computing Technologies RMIT University, Australia EMAIL Huong Ha School of Computing Technologies RMIT University, Australia EMAIL Jeffrey Chan School of Computing Technologies RMIT University, Australia EMAIL Hongyu Zhang School of Big Data and Software Engineering Chongqing University, China EMAIL
Pseudocode	Yes	Algorithm 1 The MOBO-OSD Algorithm 1: Input: Objective function f(.), evaluation budget T, batch size b, number of weight vectors nβ 2: Output: The Pareto set Ps 3: Initialize data points and append to the observed dataset D 4: while t T do 5: Compute approximated CHIM and define nβ OSDs Sec. 4.1 6: Train GPs for each objective function fm 7: for each point U(β) on the approximated CHIM do 8: Optimize the MOBO-OSD subproblem to generate a candidate x OSD(β) Eq. (2) 9: Estimate the Pareto front around x OSD(β) to explore more candidates x PFE(β) Eq. (3) 10: Append Xc Xc x PFE(β) 11: end for 12: Select a batch of b solutions from Xc and evaluate them; Increase t t + b Eq. (4) 13: end while 14: Return Ps from dataset D
Open Source Code	Yes	Our code implementation can be found at https://github.com/Lam Ngo1/mobo-osd.
Open Datasets	Yes	We conduct experiments on five synthetic and four real-world multi-objective benchmark problems. ... For synthetic benchmark problems, we use DTLZ2 with different objective settings M {2, 3, 4} [17], ZDT1 [15], and VLMOP2 [53]. For real-world benchmark problems, we use various problems from the RE problem suite [50] including Speed Reducer, Car Side Design, Marine Design, and Water Planning. The dimensionality and number of objective settings for each function are given in Table 2. These problems are widely used in the MOBO literature [5, 9, 13, 12, 1, 33]. Details of the benchmark problems can be found in Appendix A.6.
Dataset Splits	No	The paper does not provide specific train/test/validation splits for datasets, as Bayesian Optimization typically involves sequential evaluations of objective functions rather than partitioning a fixed dataset.
Hardware Specification	Yes	We run experiments on a computing server with a Dual CPU of type AMD EPYC 7662 (total of 128 Threads, 256 CPUs). Each experiment is allocated 8 CPUs and 64GB Memory. The server is installed with Ubuntu (20.04.3 LTS) Operating System.
Software Dependencies	No	We implemented MOBO-OSD and all baselines in Python (version 3.10). For the surrogate model, we implement the GPs via GPy Torch [20] and Bo Torch [3]. We follow [33] and use Matérn 5/2 kernel... We solve Eq. (2) using a gradient-based off-the-shelf optimizer, e.g., SLSQP [35]. We use NSGA-II implementation from pymoo [7].
Experiment Setup	Yes	For the number of points on approximated CHIM, we set the default value nβ = 20 and present an ablation study of other settings in Sec. 5.2. For the scaling of confidence region, we use the common 95% confidence interval, i.e., δ = 1.96 [21]. For the number of starting points when solving MOBO-OSD subproblem, we set ns = 4. ... For the surrogate model, we implement the GPs via GPy Torch [20] and Bo Torch [3]. We follow [33] and use Matérn 5/2 kernel with the ARD length-scales in the interval [10-3, 103]. The Gaussian likelihood is modeled with standard homoskedastic noise in the interval [10-6, 10-3]. ... For NSGA-II [16]. We use the default settings as follows: population size of 100, binary tournament selection, simulated binary crossover (probability p = 0.9, exponential distribution parameter η = 15) and polynomial mutation (probability p = 0.9, exponential distribution parameter η = 20).