Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization

Authors: Jixiang Qing, Rebecca D. Langdon, Robert Matthew Lee, Behrang Shafei, Mark van der Wilk, Calvin Tsay, Ruth Misener

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments showcasing SANODEP s potential for few-shot BO within dynamical systems. This section conducts experiments on meta-learning and few-shot BO for dynamical systems.
Researcher Affiliation	Collaboration	Jixiang Qing1 , Becky D Langdon1, Robert M Lee2, Behrang Shafei2, Mark van der Wilk3, Calvin Tsay1, Ruth Misener1 1Imperial College London 2BASF SE 3University of Oxford
Pseudocode	Yes	Algorithm 1 Learning and Inference in System Aware Neural ODE Processes (SANODEP) Algorithm 2 Model Assisted Ordinary Differential Equation Optimization Framework
Open Source Code	Yes	All models are implemented using Flax (Heek et al., 2023) and are open source, available in: https://github.com/ Tsing QAQ/SANODEP.
Open Datasets	No	Following Norcliffe et al. (2021), we treat F as a parametric function of a specific kinetic model with stochasticity induced by model parameter distributions P . (Section 6.1) D.1 Meta Training Data Definition (Appendix D.1 describes generating data for various ODE systems based on sampled parameters, not using pre-existing public datasets).
Dataset Splits	Yes	Excluding GP, each model was evaluated on 104 random systems, each consisting of 100 trajectories to predict in a minibatch fashion. As it is computationally infeasible to evaluate GPs on the same scale, we used a random subset of the test set, 5,000 trajectories.
Hardware Specification	Yes	We measure run time on the Lotka-Voterra (d = 2) problem using an NVIDIA A40 GPU
Software Dependencies	Yes	All models are implemented using Flax (Heek et al., 2023) and are open source, available in: https://github.com/ Tsing QAQ/SANODEP. The optimization framework is based on Trieste (Picheny et al., 2023). ODE solver: Dopri5 with rtol = 1e 5 and atol = 1e 5. We utilize trust region-based constraint optimization available in Scipy. The implementation also utilizes the parametric sampling approach of the GPJax (Pinder & Dodd, 2022) library.
Experiment Setup	Yes	In all of our subsequent experiments, we use Mmin = 0, Mmax = 10, Nx0 = 100, Nsys = 20, Ngrid = 100, mmin = 1, mmax = 10, nmin = 0, nmax = 45. (Appendix B.2) Model Hyperparameters: Encoder ϕr output dimension r: 50. ODE solver: Dopri5 with rtol = 1e 5 and atol = 1e 5. (Appendix A.2)