Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Variational Dropout Sparsifies Deep Neural Networks
Authors: Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments on classification tasks and use different neural network architectures including architectures with a combination of batch normalization and dropout layers. |
| Researcher Affiliation | Collaboration | 1Yandex, Russia 2Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia 3National Research University Higher School of Economics, Moscow, Russia 4Moscow Institute of Physics and Technology, Moscow, Russia. |
| Pseudocode | No | The paper provides mathematical expressions for calculations (e.g., equations 17 and 18) but does not include a clearly labeled pseudocode block or algorithm steps. |
| Open Source Code | Yes | Lasagne and Py Torch source code of Sparse Variational Dropout layers is available at https://goo.gl/2D4tFW. |
| Open Datasets | Yes | We compare our method with other methods of training sparse neural networks on the MNIST dataset using a fully-connected architecture Le Net-300-100 and a convolutional architecture Le Net-5-Caffe. We use CIFAR-10 and CIFAR-100 for evaluation. |
| Dataset Splits | No | The paper mentions training and testing on datasets like MNIST and CIFAR, but does not explicitly state the use of a separate validation dataset split or its size/methodology for hyperparameter tuning. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory details, or cloud instance types) used to run its experiments. |
| Software Dependencies | No | The paper mentions 'Lasagne and Py Torch' as frameworks and 'Adam' as an optimizer but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | When we start from a random initialization, we train for 200 epochs and linearly decay the learning rate from 10 4 to zero. When we start from a pre-trained model, we finetune for 10-30 epochs with learning rate 10 5. We train all networks using Adam (Kingma & Ba, 2014). |