Bayesian dark knowledge

Authors: Anoop Korattikara Balan, Vivek Rathod, Kevin P. Murphy, Max Welling

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare SGLD and distilled SGLD with other approximate inference methods, including the plugin approximation using SGD, the PBP approach of [HLA15], the BBB approach of [BCKW15], and Hamiltonian Monte Carlo (HMC), which is considered the gold standard for MCMC for neural nets. We implemented SGD and SGLD using the Torch library (torch.ch). For HMC, we used Stan (mc-stan.org). We perform this comparison for various classification and regression problems, as summarized in Table 1.
Researcher Affiliation Collaboration Anoop Korattikara, Vivek Rathod, Kevin Murphy Google Research {kbanoop, rathodv, kpmurphy}@google.com Max Welling University of Amsterdam m.welling@uva.nl
Pseudocode Yes Algorithm 1: Distilled SGLD
Open Source Code No The paper does not provide a direct link to open-source code for the described methodology or an explicit statement about its availability.
Open Datasets Yes Now we consider the MNIST digit classification problem, which has N = 60k examples, 10 classes, and D = 784 features. The only preprocessing we do is divide the pixel values by 126 (as in [BCKW15]).
Dataset Splits Yes MNIST digit classification problem... We train only on 50K datapoints and use the remaining 10K for tuning hyperparameters. This means our results are not strictly comparable to a lot of published work, which uses the whole dataset for training; however, the difference is likely to be small. Boston housing dataset... This has N = 506 data points (456 training, 50 testing)
Hardware Specification No The paper mentions training times and memory requirements but does not specify any particular GPU or CPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions "Implemented SGD and SGLD using the Torch library (torch.ch)" and "For HMC, we used Stan (mc-stan.org)", but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Next we fit this model by SGLD, using these hyper parameters: fixed learning rate of ηt = 4 10 6, thinning interval τ = 100, burn in iterations B = 1000, prior precision λ = 1, minibatch size M = 100.