The Deep Weight Prior
Authors: Andrei Atanov, Arsenii Ashukha, Kirill Struminsky, Dmitriy Vetrov, Max Welling
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that dwp improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from dwp accelerates training of conventional convolutional neural networks. |
| Researcher Affiliation | Collaboration | Andrei Atanov Skolkovo Institute of Science and Technology Samsung-HSE Laboratory, National Research University Higher School of Economics ai.atanow@gmail.com Arsenii Ashukha Samsung AI Center Moscow ars.ashuha@gmail.com Kirill Struminsky Skolkovo Institute of Science and Technology National Research University Higher School of Economics k.struminsky@gmail.com Dmitry Vetrov Samsung AI Center Moscow Samsung-HSE Laboratory, National Research University Higher School of Economics vetrovd@yandex.ru Max Welling University of Amsterdam Canadian Institute for Advanced Research m.welling@uva.nl |
| Pseudocode | Yes | Algorithm 1 Stochastic Variational Inference With Implicit Prior Distribution |
| Open Source Code | Yes | The code is available at https://github.com/bayesgroup/deep-weight-prior |
| Open Datasets | Yes | In our experiments we used MNIST (Le Cun et al., 1998), Not MNIST (Bulatov, 2011), CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009) datasets. |
| Dataset Splits | No | The paper mentions using different sizes of training sets and discusses 'test accuracy' but does not explicitly provide details about train/validation/test splits, specific percentages, or sample counts for each split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper states 'Experiments were implemented1 using Py Torch (Paszke et al., 2017).' and 'For optimization we used Adam (Kingma & Ba, 2014)'. While PyTorch is named, a specific version number is not provided, and Adam is an optimizer rather than a software dependency with a version. |
| Experiment Setup | Yes | For optimization we used Adam (Kingma & Ba, 2014) with default hyperparameters. We used a neural network with two convolutional layers with 32, 128 filters of shape 7x7, 5x5 respectively, followed by one linear layer with 10 neurons. On the CIFAR dataset we used a neural network with four convolutional layers with 128, 256, 256 filters of shape 7x7, 5x5, 5x5 respectively, followed by two fully connected layers with 512 and 10 neurons. We used a max-pooling layer (Nagi et al., 2011) After the first convolutional layer. All layers were divided with leaky ReLU nonlinearities (Nair & Hinton, 2010). We trained prior distributions on a number of source networks which were learned from different initial points on Not MNIST and CIFAR-100 datasets for MNIST and CIFAR-10 experiments respectively. Appendix F and H provide further architectural details and training parameters such as '300 epochs, Adam optimizer with linear learning rate decay from 1e-3 to 0.'. |