ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks

Authors: Mingzhang Yin, Mingyuan Zhou

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show the ARM estimator provides state-of-the-art performance in auto-encoding variational inference and maximum likelihood estimation, for discrete latent variable models with one or multiple stochastic binary layers.
Researcher Affiliation Academia Mingzhang Yin Department of Statistics and Data Sciences The University of Texas at Austin Austin, TX 78712 mzyin@utexas.edu Mingyuan Zhou Department of IROM, Mc Combs School of Business The University of Texas at Austin Austin, TX 78712 mingyuan.zhou@mccombs.utexas.edu
Pseudocode Yes Algorithm 1: ARM gradient for a V -dimensional binary latent vector; Algorithm 2: ARM gradient for a T-stochastic-hidden-layer binary network
Open Source Code Yes Python code for reproducible research is available at https://github.com/mingzhang-yin/ARM-gradient.
Open Datasets Yes We consider a widely used binarization (Salakhutdinov & Murray, 2008; Larochelle & Murray, 2011; Yin & Zhou, 2018), referred to as MNIST-static and available at http://www.dmi.usherb.ca/ larocheh/mlpython/ modules/datasets/binarized mnist.html; In addition to MNIST-static, we also consider MNIST-threshold (van den Oord et al., 2017), which binarizes MNIST by thresholding each pixel value at 0.5, and the binarized OMNIGLOT dataset.
Dataset Splits Yes For each dataset, using its default training/validation/testing partition, we train all methods on the training set, calculate the validation log-likelihood for every epoch, and report the test negative log-likelihood when the validation negative log-likelihood reaches its minimum within a predefined maximum number of iterations.
Hardware Specification Yes The authors acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research, and the computational support of Texas Advanced Computing Center.
Software Dependencies No The paper mentions using "Adam" but does not provide specific version numbers for any software components or libraries.
Experiment Setup Yes We maximize a single-Monte-Carlo-sample ELBO using Adam (Kingma & Ba, 2014), with the learning rate selected from {5, 1, 0.5} 10 4 by the validation set. We set the batch size as 50 for MNIST and 25 for OMNIGLOT.