Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks

Authors: Alexander Shekhovtsov, Viktor Yanush, Boris Flach

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models with both proposed methods.
Researcher Affiliation Academia Alexander Shekhovtsov Czech Technical University in Prague shekhovt@cmp.felk.cvut.cz Viktor Yanush Lomonosov Moscow State University yanushviktor@gmail.com Boris Flach Czech Technical University in Prague flachbor@cmp.felk.cvut.cz
Pseudocode Yes Algorithm 1: Path Sample-Analytic (PSA) Algorithm 2: Straight-Through (ST)
Open Source Code Yes The implementation is available at https://github.com/shekhovt/PSA-Neurips2020.
Open Datasets Yes To test the proposed methods in a realistic learning setting we use CIFAR-10 dataset and network with 8 convolutional and 1 fully connected layers (Appendix C).
Dataset Splits No The paper mentions using validation accuracy but does not provide specific percentages or counts for the validation split in the main text or appendices.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions PyTorch but does not specify its version or the versions of any other software dependencies, which is required for reproducibility.
Experiment Setup Yes All models are trained with SGD with momentum (0.9) and a batch size of 256. We train for 2000 epochs, linearly decaying the learning rate to zero during the last 500 epochs. Each method’s learning rate is fine-tuned by an automated search on a log-uniform grid from 10^-5 to 0.1, choosing the highest learning rate that still yields a stable training (after 50 epochs).