Variational Learning on Aggregate Outputs with Gaussian Processes

Authors: Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, Kenji Fukumizu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.Our contributions can be summarised as follows. A general framework is developed... In experiments, it is demonstrated that the proposed methods can scale to dataset sizes of more than 1 million observations. We thoroughly investigate an application of the developed methodology to disease mapping from coarse measurements, where the observation model is Poisson, giving encouraging results.
Researcher Affiliation Academia Ho Chung Leon Law University of Oxford Dino Sejdinovic University of Oxford Ewan Cameron University of Oxford Tim CD Lucas University of Oxford Seth Flaxman Imperial College London Katherine Battle University Of Oxford Kenji Fukumizu Institute of Statistical Mathematics
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available on https://github.com/hcllaw/VBAgg
Open Datasets Yes We first demonstrate our method on the swiss roll dataset7... The swiss roll manifold function (for sampling) can be found on the Python scikit-learn package.
Dataset Splits Yes we split the dataset into 4 parts, namely train, early-stop, validation and test set.We consider 576 bags for train, 95 bags each for validation and early-stop, with 191 bags for testing, with different splits across different trials, selecting them to ensure distributions of labels are similar across sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments, only mentioning the use of "Tensor Flow".
Software Dependencies No The paper mentions "Tensor Flow" and "Adam [12]" but does not specify version numbers for these or any other software libraries, which is required for reproducibility.
Experiment Setup Yes We implement our models in Tensor Flow6 and use SGD with Adam [12] to optimise their respective objectives, and we split the dataset into 4 parts, namely train, early-stop, validation and test set... The validation set is used for parameter tuning of any regularisation scaling, as well as learning rate, layer size and multiple initialisations. For the choice of k for VBAgg and Nyström, we use the RBF kernel, with the bandwidth parameter learnt. For landmark locations, we use the K-means++ algorithm.