Stochastic Variational Deep Kernel Learning
Authors: Andrew G. Wilson, Zhiting Hu, Russ R. Salakhutdinov, Eric P. Xing
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and Image Net. 5 Experiments We evaluate our proposed approach, stochastic variational deep kernel learning (SV-DKL), on a wide range of classification problems, including an airline delay task with over 5.9 million data points (section 5.1), a large and diverse collection of classification problems from the UCI repository (section 5.2), and image classification benchmarks (section 5.3). |
| Researcher Affiliation | Academia | Andrew Gordon Wilson* Cornell University Zhiting Hu* CMU Ruslan Salakhutdinov CMU Eric P. Xing CMU |
| Pseudocode | No | The paper describes procedures in prose but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We achieve good predictive accuracy and scalability over a wide range of classification tasks, while retaining a straightforward, general purpose, and highly practical probabilistic non-parametric representation, with code available at https://people.orie.cornell.edu/andrew/code. |
| Open Datasets | Yes | on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and Image Net. We first consider a large airline dataset consisting of flight arrival and departure details for all commercial flights within the US in 2008. |
| Dataset Splits | No | Following Hensman et al. [11], we selected a hold-out sets of 100,000 points uniformly at random, and the results of DNN and SV-DKL are averaged over 5 runs one standard deviation. |
| Hardware Specification | Yes | All experiments were performed on a Linux machine with eight 4.0GHz CPU cores, one Tesla K40c GPU, and 32GB RAM. |
| Software Dependencies | No | The paper mentions implementing deep neural networks with Caffe, but no specific version number for Caffe or other software dependencies is provided. |
| Experiment Setup | Yes | We initialized A to be an identity matrix, and optimized in the joint learning procedure to recover cross-dimension correlations from data. We first train a deep neural network using SGD with the softmax loss objective, and rectified linear activation functions. We achieve good performance setting the number of samples T = 1 in Eq. 4 for expectation estimation in variational inference... The SV-DKL joint training was conducted using a large minibatch size of 50,000 to reduce the variance of the stochastic gradient. We used a minibatch size of 5,000 for stochastic gradient training of SV-DKL. |