Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances

Authors: Csaba Toth, Harald Oberhauser

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then combine the resulting GP with LSTMs and GRUs to build larger models that leverage the strengths of each of these approaches and benchmark the resulting GPs on multivariate time series (TS) classification datasets.
Researcher Affiliation Academia Csaba Toth 1 Harald Oberhauser 1 1Mathematical Institute, University of Oxford, Oxford, United Kingdom. Correspondence to: Csaba Toth <csaba.toth@maths.ox.ac.uk>, Harald Oberhauser <harald.oberhauser@maths.ox.ac.uk>.
Pseudocode Yes Algorithm 1 Computing the inducing covariances KZZ; Algorithm 2 Computing the cross-covariances KZX
Open Source Code Yes Code and benchmarks are publically available at http://github.com/tgcsaba/GPSig.
Open Datasets Yes We benchmarked these GP models on 16 multivariate TS classification datasets, a collection introduced in (Baydogan, 2015) that has become a semistandard archive in TS classification... For this experiment, we took the AUSLAN dataset (Dua & Graff, 2017), which consists of nc = 95 classes for n X = 1140 training examples.
Dataset Splits Yes The RNN-architectures were selected independently for all models by grid-search among 6 variants, that is, the number of hidden units from [8, 32, 128] and with or without dropout. For training, early stopping was used with n = 500 epochs patience; a learning rate of α = 1 10 3; a minibatch size of 50; as optimizer Adam (Kingma & Ba, 2014) and Nadam (Dozat, 2015) were employed.
Hardware Specification Yes All experiments were run on a single NVIDIA GeForce GTX 1080 GPU.
Software Dependencies Yes We implemented our models using Python 3.7.3 and GPFlow 1.5.0.
Experiment Setup Yes We used n Z = 500 for all models; further all use a static kernel in one form or another, which we fixed to be the RBF kernel. The signature kernel was truncated at M = 4, and for GP-Sig p = 1 lags were used; the GP-Sig-RNNs did not use lags... The window size in GP-KConv-1D was set to w = 10. The RNN-architectures were selected independently for all models by grid-search among 6 variants, that is, the number of hidden units from [8, 32, 128] and with or without dropout. For training, early stopping was used with n = 500 epochs patience; a learning rate of α = 1 10 3; a minibatch size of 50; as optimizer Adam (Kingma & Ba, 2014) and Nadam (Dozat, 2015) were employed.