Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Authors: Neal Jean, Sang Michael Xie, Stefano Ermon

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply SSDKL to a variety of real-world regression tasks in the inductive semi-supervised learning setting, beginning with eight datasets from the UCI repository [23]. We also explore the challenging task of predicting local poverty measures from high-resolution satellite imagery [24]. In our reported results, we use the squared exponential or radial basis function kernel.
Researcher Affiliation Academia Neal Jean , Sang Michael Xie , Stefano Ermon Department of Computer Science Stanford University Stanford, CA 94305 {nealjean, xie, ermon}@cs.stanford.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes code and data for reproducing experimental results can be found on Git Hub.2 https://github.com/ermongroup/ssdkl
Open Datasets Yes We apply SSDKL to a variety of real-world regression tasks in the inductive semi-supervised learning setting, beginning with eight datasets from the UCI repository [23].
Dataset Splits Yes For each dataset, we train on n = {50, 100, 200, 300, 400, 500} labeled examples, retain 1000 examples as the hold out test set, and treat the remaining data as unlabeled examples. Following [29], the labeled data is randomly split 90-10 into training and validation samples, giving a realistically small validation set.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU, or cloud instance types) used for running the experiments.
Software Dependencies No Our SSDKL model is implemented in Tensor Flow [25]. No version number for TensorFlow is specified.
Experiment Setup Yes All kernel hyperparameters are optimized directly through Lsemisup, and we use the validation set for early stopping to prevent overfitting and for selecting α {0.1, 1, 10}. ... we choose a neural network with a similar [d-100-50-50-2] architecture and twodimensional embedding. ... We use learning rates of 1 10 3 and 0.1 for the neural network and GP parameters respectively and initialize all GP parameters to 1.