Gaussian Process Conditional Density Estimation

Authors: Vincent Dutordoir, Hugh Salimbeni, James Hensman, Marc Deisenroth

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the effectiveness and wide-reaching applicability of our model on a variety of real-world problems, such as spatio-temporal density estimation of taxi drop-offs, non-Gaussian noise modeling, and few-shot learning on omniglot images. 5 Experiments
Researcher Affiliation Collaboration Vincent Dutordoir 1 Hugh Salimbeni 1,2 Marc Peter Deisenroth1,2 James Hensman1 1PROWLER.io, Cambridge, UK 2Imperial College London
Pseudocode No The paper describes methods in text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No See https://github.com/hughsalimbeni/bayesian_benchmarks for the data. This link is explicitly for data, not the open-source code of the described methodology.
Open Datasets Yes We apply our model to a New York City taxi dataset... See https://github.com/hughsalimbeni/bayesian_benchmarks for the data. ... on the omniglot dataset. ... We use 10 UCI regression datasets... on the MNIST dataset
Dataset Splits Yes We use a test set of 1000 points, and vary the number of training points to establish the utility of models in both sparse and dense data regimes. We use 1K, 5K and 1M randomly selected training points to evaluate the models in both sparse and dense data regimes. Fig. 3 shows the test log-likelihoods using 20-fold cross validation with 10% test splits. We train the models with N = 2, 4, 8, . . . , 512 images per class
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments are provided in the paper.
Software Dependencies No The paper mentions software components like 'Adam optimizer' but does not provide specific version numbers for any libraries, frameworks, or programming languages.
Experiment Setup Yes For training we use the Adam optimizer with a exponentially decaying learning rate starting at 0.01 for the hyperparameters, the inducing inputs and on the recognition network parameters. Natural gradient steps of size 0.05 are used for the GP’s variational parameters. optimizing for 20K iterations using Adam optimizer for the hyperparameters and a natural gradient optimizer with step size 0.1 for the Gaussian variational parameters.