Stochastic Differential Equations with Variational Wishart Diffusions
Authors: Martin Jørgensen, Marc Deisenroth, Hugh Salimbeni
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental evidence that modelling diffusion often improves performance and that this randomness in the differential equation can be essential to avoid overfitting. We evaluate the presented model in both regression and a dynamical setup. In both instances, we use baselines that are similar to our model to easier distinguish the influence the diffusion has on the experiments. We evaluate on a well-studied regression benchmark and on a higher-dimensional dynamical dataset. |
| Researcher Affiliation | Collaboration | 1Department for Mathematics and Computer Science, Technical University of Denmark 2Department of Computer Science, University College London 3G-Research. Correspondence to: Martin Jørgensen <marjor@dtu.dk>. |
| Pseudocode | No | The paper describes algorithms and models in text and equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available at: https://github.com/JorgensenMart/Wishart-priored-SDE. |
| Open Datasets | Yes | We evaluate our dynamical model on atmospheric air-quality data from Beijing (Zhang et al., 2017). We use the first two years of this dataset for training and aim to forecast into the first 48 hours of 2016. Full data set available at https://archive.ics. uci.edu/ml/datasets/Beijing+Multi-Site+ Air-Quality+Data. |
| Dataset Splits | Yes | Figure 2 shows the results on eight UCI benchmark datasets over 20 train-test splits (90/10). |
| Hardware Specification | No | The paper does not explicitly mention the specific hardware (e.g., GPU/CPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the Adam-optimiser but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In all experiments, we choose 100 inducing points for the variational distributions, all of which are Gaussians. All models are trained for 50000 iterations with a mini-batch size of 2000, or the number of samples in the data if smaller. In all instances, the first 10000 iterations are warm-starting the final layer GP g, keeping all other parameters fixed. We use the Adam-optimiser with a step-size of 0.01. The remaining 40000 iterations (SGP excluded) are updating again with Adam with a more cautious step-size of 0.001. For the diff WGP, the first 4000 of these are warmstarting the KL-terms associated with the flow to speed up convergence. |