Stochastic Variational Deep Kernel Learning

Authors: Andrew G. Wilson, Zhiting Hu, Russ R. Salakhutdinov, Eric P. Xing

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and Image Net. 5 Experiments We evaluate our proposed approach, stochastic variational deep kernel learning (SV-DKL), on a wide range of classification problems, including an airline delay task with over 5.9 million data points (section 5.1), a large and diverse collection of classification problems from the UCI repository (section 5.2), and image classification benchmarks (section 5.3).
Researcher Affiliation Academia Andrew Gordon Wilson* Cornell University Zhiting Hu* CMU Ruslan Salakhutdinov CMU Eric P. Xing CMU
Pseudocode No The paper describes procedures in prose but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We achieve good predictive accuracy and scalability over a wide range of classification tasks, while retaining a straightforward, general purpose, and highly practical probabilistic non-parametric representation, with code available at https://people.orie.cornell.edu/andrew/code.
Open Datasets Yes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and Image Net. We first consider a large airline dataset consisting of flight arrival and departure details for all commercial flights within the US in 2008.
Dataset Splits No Following Hensman et al. [11], we selected a hold-out sets of 100,000 points uniformly at random, and the results of DNN and SV-DKL are averaged over 5 runs one standard deviation.
Hardware Specification Yes All experiments were performed on a Linux machine with eight 4.0GHz CPU cores, one Tesla K40c GPU, and 32GB RAM.
Software Dependencies No The paper mentions implementing deep neural networks with Caffe, but no specific version number for Caffe or other software dependencies is provided.
Experiment Setup Yes We initialized A to be an identity matrix, and optimized in the joint learning procedure to recover cross-dimension correlations from data. We first train a deep neural network using SGD with the softmax loss objective, and rectified linear activation functions. We achieve good performance setting the number of samples T = 1 in Eq. 4 for expectation estimation in variational inference... The SV-DKL joint training was conducted using a large minibatch size of 50,000 to reduce the variance of the stochastic gradient. We used a minibatch size of 5,000 for stochastic gradient training of SV-DKL.