reproducibilityindex.ai

Variational Dropout Sparsifies Deep Neural Networks

Authors: Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments on classiﬁcation tasks and use different neural network architectures including architectures with a combination of batch normalization and dropout layers.
Researcher Affiliation	Collaboration	1Yandex, Russia 2Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia 3National Research University Higher School of Economics, Moscow, Russia 4Moscow Institute of Physics and Technology, Moscow, Russia.
Pseudocode	No	The paper provides mathematical expressions for calculations (e.g., equations 17 and 18) but does not include a clearly labeled pseudocode block or algorithm steps.
Open Source Code	Yes	Lasagne and Py Torch source code of Sparse Variational Dropout layers is available at https://goo.gl/2D4tFW.
Open Datasets	Yes	We compare our method with other methods of training sparse neural networks on the MNIST dataset using a fully-connected architecture Le Net-300-100 and a convolutional architecture Le Net-5-Caffe. We use CIFAR-10 and CIFAR-100 for evaluation.
Dataset Splits	No	The paper mentions training and testing on datasets like MNIST and CIFAR, but does not explicitly state the use of a separate validation dataset split or its size/methodology for hyperparameter tuning.
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory details, or cloud instance types) used to run its experiments.
Software Dependencies	No	The paper mentions 'Lasagne and Py Torch' as frameworks and 'Adam' as an optimizer but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	When we start from a random initialization, we train for 200 epochs and linearly decay the learning rate from 10 4 to zero. When we start from a pre-trained model, we ﬁnetune for 10-30 epochs with learning rate 10 5. We train all networks using Adam (Kingma & Ba, 2014).