The Deep Weight Prior

Authors: Andrei Atanov, Arsenii Ashukha, Kirill Struminsky, Dmitriy Vetrov, Max Welling

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that dwp improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from dwp accelerates training of conventional convolutional neural networks.
Researcher Affiliation Collaboration Andrei Atanov Skolkovo Institute of Science and Technology Samsung-HSE Laboratory, National Research University Higher School of Economics ai.atanow@gmail.com Arsenii Ashukha Samsung AI Center Moscow ars.ashuha@gmail.com Kirill Struminsky Skolkovo Institute of Science and Technology National Research University Higher School of Economics k.struminsky@gmail.com Dmitry Vetrov Samsung AI Center Moscow Samsung-HSE Laboratory, National Research University Higher School of Economics vetrovd@yandex.ru Max Welling University of Amsterdam Canadian Institute for Advanced Research m.welling@uva.nl
Pseudocode Yes Algorithm 1 Stochastic Variational Inference With Implicit Prior Distribution
Open Source Code Yes The code is available at https://github.com/bayesgroup/deep-weight-prior
Open Datasets Yes In our experiments we used MNIST (Le Cun et al., 1998), Not MNIST (Bulatov, 2011), CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009) datasets.
Dataset Splits No The paper mentions using different sizes of training sets and discusses 'test accuracy' but does not explicitly provide details about train/validation/test splits, specific percentages, or sample counts for each split.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper states 'Experiments were implemented1 using Py Torch (Paszke et al., 2017).' and 'For optimization we used Adam (Kingma & Ba, 2014)'. While PyTorch is named, a specific version number is not provided, and Adam is an optimizer rather than a software dependency with a version.
Experiment Setup Yes For optimization we used Adam (Kingma & Ba, 2014) with default hyperparameters. We used a neural network with two convolutional layers with 32, 128 filters of shape 7x7, 5x5 respectively, followed by one linear layer with 10 neurons. On the CIFAR dataset we used a neural network with four convolutional layers with 128, 256, 256 filters of shape 7x7, 5x5, 5x5 respectively, followed by two fully connected layers with 512 and 10 neurons. We used a max-pooling layer (Nagi et al., 2011) After the first convolutional layer. All layers were divided with leaky ReLU nonlinearities (Nair & Hinton, 2010). We trained prior distributions on a number of source networks which were learned from different initial points on Not MNIST and CIFAR-100 datasets for MNIST and CIFAR-10 experiments respectively. Appendix F and H provide further architectural details and training parameters such as '300 epochs, Adam optimizer with linear learning rate decay from 1e-3 to 0.'.