Good Initializations of Variational Bayes for Deep Models
Authors: Simone Rossi, Pietro Michiardi, Maurizio Filippone
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method is extensively validated on regression and classification tasks, including Bayesian Deep Nets and Conv Nets, showing faster and better convergence compared to alternatives inspired by the literature on initializations for loss minimization. |
| Researcher Affiliation | Academia | Simone Rossi 1 Pietro Michiardi 1 Maurizio Filippone 1 Department of Data Science, EURECOM, France. Correspondence to: Simone Rossi <simone.rossi@eurecom.fr>. |
| Pseudocode | Yes | Algorithm 1: Sketch of the I-BLM Initializer |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the described methodology. A GitHub link is mentioned, but it refers to replicating results from a different paper. |
| Open Datasets | Yes | We tested I-BLM with classification problems on MNIST (n = 70000, d = 784), EEG (n = 14980, d = 14), CREDIT (n = 1000, d = 24) and SPAM (n = 4601, d = 57). We tested our framework on MNIST and on CIFAR10. |
| Dataset Splits | No | The paper mentions 'train/test splits' for some experiments, but does not explicitly specify validation splits or their sizes. |
| Hardware Specification | Yes | All experiments are run on a server equipped with two 16c/32t Intel Xeon CPU and four NVIDIA Tesla P100, with a maximum time budget of 24 hours (never reached). |
| Software Dependencies | No | The paper mentions software components like ADAM optimizer and PYTORCH, but does not provide specific version numbers for these or other dependencies needed for replication. |
| Experiment Setup | Yes | Throughout the experiments, we use the ADAM optimizer (Kingma & Ba, 2015) with learning rate 10 3, batch size 64, and 16 Monte Carlo samples at training time and 128 at test time. The architecture used in these experiments has one single hidden layer with 100 hidden neurons and Re LU activations. |