Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance
Authors: Neal Jean, Sang Michael Xie, Stefano Ermon
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply SSDKL to a variety of real-world regression tasks in the inductive semi-supervised learning setting, beginning with eight datasets from the UCI repository [23]. We also explore the challenging task of predicting local poverty measures from high-resolution satellite imagery [24]. In our reported results, we use the squared exponential or radial basis function kernel. |
| Researcher Affiliation | Academia | Neal Jean , Sang Michael Xie , Stefano Ermon Department of Computer Science Stanford University Stanford, CA 94305 EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | code and data for reproducing experimental results can be found on Git Hub.2 https://github.com/ermongroup/ssdkl |
| Open Datasets | Yes | We apply SSDKL to a variety of real-world regression tasks in the inductive semi-supervised learning setting, beginning with eight datasets from the UCI repository [23]. |
| Dataset Splits | Yes | For each dataset, we train on n = {50, 100, 200, 300, 400, 500} labeled examples, retain 1000 examples as the hold out test set, and treat the remaining data as unlabeled examples. Following [29], the labeled data is randomly split 90-10 into training and validation samples, giving a realistically small validation set. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | Our SSDKL model is implemented in Tensor Flow [25]. No version number for TensorFlow is specified. |
| Experiment Setup | Yes | All kernel hyperparameters are optimized directly through Lsemisup, and we use the validation set for early stopping to prevent overfitting and for selecting α {0.1, 1, 10}. ... we choose a neural network with a similar [d-100-50-50-2] architecture and twodimensional embedding. ... We use learning rates of 1 10 3 and 0.1 for the neural network and GP parameters respectively and initialize all GP parameters to 1. |