Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Objective Causal Bayesian Optimization

Authors: Shriya Bhatija, Paul-David Zuercher, Jakob Thumm, Thomas Bohné

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic and real-world causal graphs demonstrate the superiority of our approach over non-causal multi-objective Bayesian optimization in settings where causal information is available.
Researcher Affiliation	Academia	1Department of Computer Engineering, Technical University of Munich, Munich, Germany 2Department of Engineering, University of Cambridge, Cambridge, United Kingdom 3The Alan Turing Institute, London, United Kingdom. Correspondence to: Shriya Bhatija <EMAIL>.
Pseudocode	Yes	We propose our algorithm to solve MO-CBO problems1, for which the procedure is summarized in Algorithm 1. It assumes a known causal graph G, Y, X, C , prior data D, and a set S {OG,Y, MG,Y, P(X)} that specifies which local problems to consider.
Open Source Code	Yes	1The full implementation of our algorithm is available at https://github.com/Shriya Bhatija/MO-CBO
Open Datasets	Yes	The model is inspired by the German Credit UCI dataset (Murphy, 1994), with causal dependencies adapted from Karimi et al. (2020). ... Murphy, P. M. UCI repository of machine learning databases, 1994. URL ftp://ftp.ics.uci.edu/ pub/machine-learning-databases/. ... This model originates from previous works of Ferro et al. (2015), and is based on real-world causal relationships.
Dataset Splits	No	We assume to have an initial dataset D = {((Xs, xk s), µ(Xs, xk s))}K,\|S\| k=1,s=1 with K = 5 samples per intervention set.
Hardware Specification	Yes	All experiments were executed on a machine equipped with an Apple M2 processor and 8GB of RAM.
Software Dependencies	No	We implement q NEHVI using the botorch library.
Experiment Setup	Yes	The batch size is set to 5. For reproducibility, all experiments are run across 10 random seeds, resulting in varying initializations of D. ... Par EGO: ...σ = 0.5 as initial standard deviation. ... TSEMO: ...use 100 points for spectral sampling. ... q NEHVI: ...use 10 optimization restarts, and 64 raw samples for acquisition maximization. Moreover, the acquisition function uses a Sobol QMC sampler with 128 samples.