reproducibilityindex.ai

Design of Experiments for Model Discrimination Hybridising Analytical and Data-Driven Approaches

Authors: Simon Olofsson, Marc Deisenroth, Ruth Misener

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The GP surrogate method performance will be studied using two different case studies. Case study 1 has analytical model functions, so we can compare how well our new method performs compared to the classical analytical methods in Section 2.1. Case study 2 has pharmacokinetic-like models consisting of systems of ordinary differential equations. In both case studies we compute the following metrics: (A) the average number of additional experiments N N0 required for all incorrect models to be discarded. (SE) the standard error of the average (A). (S) success rate; the proportion of tests in which all incorrect models were discarded. (F) failure rate; the proportion of tests in which the correct (data-generating) model was discarded. (I) the proportion of inconclusive tests (all models were deemed inaccurate or the maximum number of additional experiments Nmax N0 was reached). We compare the design criteria (DC) DBH, DBF and DAW described in Section 2.1 as well as random uniform sampling (denoted Uni.). We also compare the three different criteria for model discrimination (MD) described in Section 2.1: updated posterior likelihood πN,i, the χ2 adequacy test, and the Akaike information criterion (AIC).
Researcher Affiliation	Collaboration	1Dept. of Computing, Imperial College London, United Kingdom. 2PROWLER.io, United Kingdom.
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	The method described in this paper has been implemented in a python software package, GPdoemd (2018), publicly available via Git Hub under an MIT License. GPdoemd. Github repository. https://github.com/ cog-imperial/GPdoemd, 2018.
Open Datasets	Yes	Case study 1 is from the seminal paper by Buzzi-Ferraris et al. (1984). Case study 2 is from Vanlier et al. (2014).
Dataset Splits	No	The paper mentions initial experimental observations (N0) but does not specify traditional training, validation, and test dataset splits.
Hardware Specification	No	The paper discusses computational time but does not provide specific hardware details such as CPU/GPU models or memory.
Software Dependencies	No	The paper mentions implementation in 'a python software package, GPdoemd (2018)', but does not provide specific version numbers for Python or any other software dependencies.
Experiment Setup	Yes	For Case study 1: 'We start each test with N0 = 5 randomly sampled experimental observations, and set a maximum budget of Nmax N0 = 40 additional experiments.' and 'Buzzi-Ferraris et al. (1984) generate the experimental data Dexp from model M1 using θ1,1 = θ1,3 = 0.1 and θ1,2 = θ1,4 = 0.01 and Gaussian noise covariance Σexp = diag(0.35, 2.3e-3).' For Case study 2: 'We follow Vanlier et al. (2014) and let M1 generate the observed experimental data, with random uniformly sampled true model parameters and experimental measurement noise covariance Σexp = 9.0 10 4I. Each test is initiated with N0 = 20 observations at random design locations x1, . . . , x N0. We set a maximum budget of Nmax N0 = 100 additional experiments.'