ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks
Authors: Mingzhang Yin, Mingyuan Zhou
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show the ARM estimator provides state-of-the-art performance in auto-encoding variational inference and maximum likelihood estimation, for discrete latent variable models with one or multiple stochastic binary layers. |
| Researcher Affiliation | Academia | Mingzhang Yin Department of Statistics and Data Sciences The University of Texas at Austin Austin, TX 78712 mzyin@utexas.edu Mingyuan Zhou Department of IROM, Mc Combs School of Business The University of Texas at Austin Austin, TX 78712 mingyuan.zhou@mccombs.utexas.edu |
| Pseudocode | Yes | Algorithm 1: ARM gradient for a V -dimensional binary latent vector; Algorithm 2: ARM gradient for a T-stochastic-hidden-layer binary network |
| Open Source Code | Yes | Python code for reproducible research is available at https://github.com/mingzhang-yin/ARM-gradient. |
| Open Datasets | Yes | We consider a widely used binarization (Salakhutdinov & Murray, 2008; Larochelle & Murray, 2011; Yin & Zhou, 2018), referred to as MNIST-static and available at http://www.dmi.usherb.ca/ larocheh/mlpython/ modules/datasets/binarized mnist.html; In addition to MNIST-static, we also consider MNIST-threshold (van den Oord et al., 2017), which binarizes MNIST by thresholding each pixel value at 0.5, and the binarized OMNIGLOT dataset. |
| Dataset Splits | Yes | For each dataset, using its default training/validation/testing partition, we train all methods on the training set, calculate the validation log-likelihood for every epoch, and report the test negative log-likelihood when the validation negative log-likelihood reaches its minimum within a predeļ¬ned maximum number of iterations. |
| Hardware Specification | Yes | The authors acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research, and the computational support of Texas Advanced Computing Center. |
| Software Dependencies | No | The paper mentions using "Adam" but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | We maximize a single-Monte-Carlo-sample ELBO using Adam (Kingma & Ba, 2014), with the learning rate selected from {5, 1, 0.5} 10 4 by the validation set. We set the batch size as 50 for MNIST and 25 for OMNIGLOT. |