Probabilistic ODE Solvers with Runge-Kutta Means

Authors: Michael Schober, David K. Duvenaud, Philipp Hennig

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Since Runge-Kutta methods have been extensively studied for over a century [11], it is not necessary to evaluate their estimation performance again. Instead, we focus on an open conceptual question for the further development of probabilistic Runge-Kutta methods: If we accept high convergence order as a prerequisite to choose a probabilistic model, how should probabilistic ODE solvers continue after the first s steps? Purely from an inference perspective, it seems unnatural to introduce new evaluations of x (as opposed to x) at t0 + nh for n = 1,2,.... Also, with the exception of the Euler case, the posterior covariance after s evaluations is of such a form that its renewed use in the next interval will not give Runge-Kutta estimates. Three options suggest themselves: Naïve Chaining One could simply re-start the algorithm several times as if the previous step had created a novel IVP. This amounts to the classic RK setup. However, it does not produce a joint global posterior probability distribution (Figure 2, left column). Smoothing An ad-hoc remedy is to run the algorithm in the Naïve chaining mode above, producing N s gradient observations and N function evaluations, but then compute a joint posterior distribution by using the first s gradient observations and 1 function evaluation as described in Section 3, then using the remaining s(N 1) gradients and (N 1) function values as in standard GP inference. The appeal of this approach is that it produces a GP posterior whose mean goes through the RK points (Figure 2, center column). But from a probabilistic standpoint it seems contrived. In particular, it produces a very confident posterior covariance, which does not capture global error. Continuing after s evaluations Perhaps most natural from the probabilistic viewpoint is to break with the RK framework after the first RK step, and simply continue to collect gradient observations either at RK locations, or anywhere else. The strength of this choice is that it produces a continuously growing marginal variance (Figure 2, right). One may perceive the departure from the established RK paradigm as problematic. However, we note again that the core theoretical argument for RK methods is only strictly valid in the first step, the argument for iterative continuation is a lot weaker. Figure 2 shows exemplary results for these three approaches on the (stiff) linear IVP x(t) = 1/2x(t), x(0) = 1. Naïve chaining does not lead to a globally consistent probability distribution. Smoothing does give this global distribution, but the observations of function values create unnatural nodes of certainty in the posterior. The probabilistically most appealing mode of continuing inference directly offers a naturally increasing estimate of global error. At least for this simple test case, it also happens to work better in practice (note good match to ground truth in the plots). We have found similar results for other test cases, notably also for non-stiff linear differential equations. But of course, probabilistic continuation breaks with at least the traditional mode of operation for Runge-Kutta methods, so a closer theoretical evaluation is necessary, which we are planning for a follow-up publication. Comparison to Square-Exponential kernel Since all theoretical guarantees are given in forms of upper bounds for the RK methods, the application of different GP models might still be favorable in practice. We compared the continuation method from Fig. 2 (right column) to the ad-hoc choice of a square-exponential (SE) kernel model, which was used by Hennig and Hauberg [6] (Fig. 3). For this test case, the GMRK method surpasses the SE-kernel algorithm both in accuracy and calibration: its mean is closer to the true solution than the SE method, and its error bar covers the true solution, while the SE method is over-confident. This advantage in calibration is likely due to the more natural choice of the output scale σ2 in the GMRK framework.
Researcher Affiliation Academia Michael Schober MPI for Intelligent Systems Tübingen, Germany mschober@tue.mpg.de David Duvenaud Department of Engineering Cambridge University dkd23@cam.ac.uk Philipp Hennig MPI for Intelligent Systems Tübingen, Germany phennig@tue.mpg.de
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statements or links indicating that open-source code for the described methodology is available.
Open Datasets No The paper evaluates its methods on a 'linear IVP x(t) = 1/2x(t), x(0) = 1', which is a synthetic test case, not a publicly available dataset with specific access information.
Dataset Splits No The paper uses a specific initial value problem (IVP) for evaluation, not a dataset with explicit training, validation, or test splits. Therefore, no validation split information is provided.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Gaussian processes' and 'square-exponential kernel' as theoretical components but does not provide specific software names with version numbers required to replicate the experiments.
Experiment Setup Yes Figure 2 shows exemplary results for these three approaches on the (stiff) linear IVP x(t) = 1/2x(t), x(0) = 1. All plots use the midpoint method and h = 1.