Scalable Multi-Class Gaussian Process Classification using Expectation Propagation

Authors: Carlos Villacampa-Calvo, Daniel Hernández-Lobato

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare empirically this method with alternative approaches that approximate the required computations using variational inference. The results show that it performs similar or even better than these techniques... and 4. Experiments We evaluate the performance of the method proposed in Section 2.2.
Researcher Affiliation Academia 1Universidad Aut onoma de Madrid, Madrid, Spain. Correspondence to: Carlos Villacampa-Calvo <carlos.villacampa@uam.es>.
Pseudocode No The paper describes the method and its steps but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes All methods are codified in the R language (the source code is in the supplementary material)
Open Datasets Yes We evaluate the performance of each method on 8 datasets from the UCI repository (Lichman, 2013). We evaluate the performance of each method on the MNIST dataset (Le Cun et al., 1998). A last experiment considers all flights within the USA between 01/2008 and 04/2008 (http://stat-computing.org/dataexpo/2009).
Dataset Splits No We use 90% of the data for training and 10% for testing, expect for Satellite which is fairly big, where we use 20% for training and 80% for testing. In Waveform, which is synthetic, we generate 1000 instances and split them in 30% for training and 70% for testing.
Hardware Specification No The paper acknowledges the use of computing facilities but does not provide specific hardware details (e.g., CPU/GPU models, memory).
Software Dependencies No The paper states 'All methods are codified in the R language' and mentions using 'ADAM', but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup Yes All methods are trained for 250 iterations using gradient ascent (GFITC and VI use l BFGS, EP and SEP use an adaptive learning rate described in the supplementary material). We consider three values for M, the number of inducing points. Namely 5%, 10% and 20% of the number of training instances. VI is trained using stochastic gradients for 2000 epochs... and 100 as the mini-batch size. M = 200 inducing points and mini-batches with 200 instances.