reproducibilityindex.ai

Stochastic Differential Equations with Variational Wishart Diffusions

Authors: Martin Jørgensen, Marc Deisenroth, Hugh Salimbeni

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experimental evidence that modelling diffusion often improves performance and that this randomness in the differential equation can be essential to avoid overﬁtting. We evaluate the presented model in both regression and a dynamical setup. In both instances, we use baselines that are similar to our model to easier distinguish the inﬂuence the diffusion has on the experiments. We evaluate on a well-studied regression benchmark and on a higher-dimensional dynamical dataset.
Researcher Affiliation	Collaboration	1Department for Mathematics and Computer Science, Technical University of Denmark 2Department of Computer Science, University College London 3G-Research. Correspondence to: Martin Jørgensen <marjor@dtu.dk>.
Pseudocode	No	The paper describes algorithms and models in text and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is publicly available at: https://github.com/JorgensenMart/Wishart-priored-SDE.
Open Datasets	Yes	We evaluate our dynamical model on atmospheric air-quality data from Beijing (Zhang et al., 2017). We use the ﬁrst two years of this dataset for training and aim to forecast into the ﬁrst 48 hours of 2016. Full data set available at https://archive.ics. uci.edu/ml/datasets/Beijing+Multi-Site+ Air-Quality+Data.
Dataset Splits	Yes	Figure 2 shows the results on eight UCI benchmark datasets over 20 train-test splits (90/10).
Hardware Specification	No	The paper does not explicitly mention the specific hardware (e.g., GPU/CPU models) used for running the experiments.
Software Dependencies	No	The paper mentions using the Adam-optimiser but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	In all experiments, we choose 100 inducing points for the variational distributions, all of which are Gaussians. All models are trained for 50000 iterations with a mini-batch size of 2000, or the number of samples in the data if smaller. In all instances, the ﬁrst 10000 iterations are warm-starting the ﬁnal layer GP g, keeping all other parameters ﬁxed. We use the Adam-optimiser with a step-size of 0.01. The remaining 40000 iterations (SGP excluded) are updating again with Adam with a more cautious step-size of 0.001. For the diff WGP, the ﬁrst 4000 of these are warmstarting the KL-terms associated with the ﬂow to speed up convergence.